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Personal responsibility 


The US Precision Medicine Initiative needs to tread carefully when revealing health and genetic 


data to participants. 


hen Stephen Damiani’s one-year-old son Massimo suddenly 
We: his ability to crawl and developed other problems in 
2009, doctors could not diagnose him and told his parents 
that he was unlikely to live long. But Stephen, who is nota scientist, had 
his family’s genomes sequenced and worked with geneticist Ryan Taft 
at the University of Queensland in Australia to identify a mutated gene. 

Taft linked the gene to a class of neurodegenerative disorders involv- 
ing the myelin sheath, which protects neurons. The discovery allowed 
Massimo to be treated with therapies for related conditions, and it 
helped a dozen or so other families worldwide who realized that their 
children had the same disorder. 

Such commendable initiative and diligence is a testament to the 
power and promise of precision medicine — therapies targeted to indi- 
viduals. But unlike rare conditions such as Massimo’, the overwhelm- 
ing majority of genetic and environmental factors linked to common 
diseases contribute only slightly to disease risk. Hundreds of genes are 
probably involved in depression or breast cancer, for instance. 

Researchers and ethicists have spent decades struggling with the 
question of how much data to release to patients. Many argue that 
revealing information about disease risk to individuals is unneces- 
sary and irresponsible, owing to the potential for misinterpretation. 
Stoking people's fears in this way could lead to expensive, unwarranted 
and invasive medical tests. And such information could perpetuate 
the idea of genes as destiny — a low genetic risk for heart disease, 
for instance, could be used as an excuse to eschew a healthy diet or 
exercise. The darker side of the argument, which scientists hesitate to 
state publicly, is the worry that releasing data into the public sphere 
too soon could scupper their chances for publication. 


Historically, most clinical trials have not returned individuals’ 
information. But recent years have seen a move towards openness, 
from WikiLeaks to open-access publication. The attitude that a select 
few should control others’ data is increasingly seen as paternalistic. 

Treading this careful line between professional responsibility and 
transparency is the US Precision Medicine Initiative. Backed by Presi- 
dent Barack Obama, who announced it in his January State of the Union 
address, the project aims to collect genetic, medical, physiological and 
environmental data from 1 million people and follow them for decades 
in an attempt to link these factors with health outcomes. Its planning 
committee, which is expected to propose a design for the project this 
month, seems to be undecided on the issue: the argument came up 
repeatedly and heatedly at a July committee workshop (see page 16). 

The simplest but most restrictive approach is to inform people about 
findings only once they are discovered. Yet it seems incongruous to 
withhold health data when decisions such as organ donation are based 
on the idea that the body is the person’s legal possession. A better 
solution would be withholding data by default, but releasing them 
if participants request it. Ideally, that release should occur only with 
guidance on how to interpret the information and alongside counsel- 
ling on its significance. 

Such a system would stretch the Precision Medicine Initiative’s 
underwhelming US$215-million budget, and could burden researchers 
who are searching for broader trends. Not every clinical study should 
be expected to take such an approach. But the initiative has tried to 
build its brand as an atypical, egalitarian study in which participants 
are partners with researchers. With proper cautions in place, access 
on request could demonstrate how to make good on that promise. = 


Parched California 


Drought highlights the state’s lack of an 
ecological strategy. 


transpacificus) has received outsize attention. In the sprawling 
waterways of the Sacramento-San Joaquin river delta, which 
channel precious water throughout northern California, the smelt has 
served as an environmental sentry. When its numbers plummet, water 
managers flood the delta with fresh water, to the outrage of farmers who 
would rather have it nourishing their crops. 
Yet the drought may finally do for the smelt. As California looks to 
enter its fifth year of drought, officials face difficult choices on how to 
manage water over the long term. So far, thanks to resilience built into 


E: an unassuming little fish, the delta smelt (Hypomesus 


the water system in past years, Californians are weathering the shortage 
remarkably well (see Amir AghaKouchak et al. Nature 524, 409-411; 
2015). Cities have opted to control their love of lush lawns, and farmers 
have shifted to efficient irrigation and other water-saving measures. 

But how long can the Golden State's lustre last? Two new reports 
(see go.nature.com/jpze97 and go.nature.com/okxrdo) highlight possi- 
ble futures should the drought continue. And the outlook is not always 
that promising. Water managers cannot simply hope for a rainy winter, 
perhaps prompted by El Nifo. Farmers will still pump groundwater for 
California's US$46-billion agricultural industry, so water tables will 
continue to drop. More at risk are California’s iconic ecosystems, from 
towering redwood trees to rivers teeming with salmon and trout. Wild- 
life managers have arranged to keep the most crucial wetlands damp for 
bird visits, and forestry managers extinguish wildfires as soon as they 
start. But such piecemeal approaches must be turned into a long-term 
strategy, much as farmers and urban planners have already done for 
their thirsty constituents. 

Otherwise, the delta smelt may vanish for good. m 
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atorium on the cultivation of genetically modified (GM) crops. 

The decision will doubtless meet a well-orchestrated barrage of 
criticism. When the Scottish government made the same call last month, 
its decision was roundly condemned by plant biologists and scientific 
leaders such as Anne Glover, former chief scientific adviser to the presi- 
dent of the European Commission. Critics portray the ban as an affront 
to science and to the idea that regulation should be based on evidence. 

I'm a big fan of the scientific method. You won't find me sitting 
in an Airbus 320 thanking the Lord for keeping the aircraft aloft. I 
happily attribute its successful flight to the scientists and engineers 
who mastered fluid dynamics. I also support the general principle of 
evidence-based policy. 

Yet I’m relaxed about the pending decisions 
of Scotland, Germany, France, Italy and others 
to stand up to corporate pressure and keep GM 
crop technology out of the European country- 
side. I await with interest England’s response 
to the deal that the European Union made last 
December that allows its member states to make 
their own choices on licensing GM crops. 

Whatever these nations decide, the stakes are 
not as high as they once were. When the United 
States started to license GM soya beans and 
maize (corn) 20 years ago, many crop producers 
thought that global acceptance of the technology 
would rest heavily on European acceptance. That 
is probably no longer true. The global acreage of 
GM crops has grown consistently without broad 
acceptance from Europe. It is now topping out. 
Last year, it grew by only around 3%, according to industry figures, 
to 181 million hectares — a little more than one-tenth of the 1.5 bil- 
lion hectares of land that the United Nations estimates to be under 
crop cultivation. 

Five-sixths of that GM acreage is in the Americas. The rest consists 
mostly of non-food crops (mainly cotton) grown in India and China. 
Little of the harvest is in nations that need improved yields to feed them- 
selves. Twenty years in, the GM strains currently under cultivation are 
still best suited to the needs of large-scale industrial farmers who can 
afford the seeds and inputs that accompany them. Whatever Europe 
decides, the rest of the world isn’t waiting to follow suit. 

And this time, Europe’s debate about GM crop cultivation isn’t 
really over GM crops themselves, but over how nations should 
assess and manage risk. When Europe turned its back on GM crops 
15 years ago, the pro-GM lobby warned that this 


| ast week, Reuters reported that Germany is set to continue its mor- 


signalled a continent in crisis, one unwilling to NATURE.COM 
embrace the future. But there has been scant _ Discuss this article 
indication since then that Europe is technology- _ online at: 


averse. It has not slowed itself down or tied itself —_go.nature.com/n88nk3 


SUBSTANTIAL 
EQUIVALENCE 
WAS THE 


ORIGINAL SIN 


THAT UNDERMINED 


PUBLIC 
CONFIDENCE 


IN GM TECHNOLOGY. 


~ Rejection of GM crops is 


=8 not a failure for science 


Governments maintaining their antipathy for transgenic crops are sensibly 
balancing public consent with scientific evidence, says Colin Macilwain. 


up by rejecting nanotechnology-based wound-dressings or mobile 
phones, of which it was the world’s fastest adopter. 

Despite the GM episode, evidence-based policy is alive and kicking in 
Europe. But good risk management involves early communication with 
the public and the careful weighing of many factors, not just scientific 
risk assessment. In general, however, industry — which usually holds 
most of the relevant data — favours scientific risk assessment as the 
be-all and end-all of regulation (see Nature 508, 289; 2014). Environ- 
mentalists — even gentle ones, such as the European Commission and 
former US vice-president Al Gore — prefer the precautionary principle, 
which places the burden of proof on the innovator. 

In practice, all governments have to walk a line between the two. 
But where to draw that line? In Europe, especially 
in countries that value the provenance of food, 
much of the general public doesn’t want GM 
foods. The jury, too, remains out on their eco- 
logical impacts (see Nature 497, 24-26; 2013). 
Should they nonetheless be grown because the 
data say that they’re safe to eat? Call me naive, but 
given the threadbare state of our democracy, it 
doesn't do to override public concern in that way. 

In the United States, the key regulatory deci- 
sions were made in 1995, with scant public input. 
They clicked in place on the basis of ‘substan- 
tial equivalence’ which holds that GM foods are 
substantially the same as their component parts. 

Substantial equivalence was the original sin 
that undermined public confidence in GM tech- 
nology, and advocates have been over-compen- 
sating for it ever since. Genetic modification is a 
blockbuster technology with a broad ability to mix and match genes; 
its use or misuse has profound implications for global ecology and the 
food supply. It is in no sense ‘substantially equivalent to plant breeding. 

That sin may shortly be expunged. On 2 July, John Holdren, science 
adviser to US President Barack Obama, directed regulators to revisit 
the US framework for regulating agricultural biotechnology. Holdren 
is promising simpler rules for small producers, but also more transpar- 
ency. Many US consumers have grown sceptical of the technology; 
in April 2014, Vermont became the first state to mandate labelling of 
products that contain GM crops. (The US House of Representatives has 
responded by passing a bill that would prohibit such state provisions.) 

Some critics still hope that universal labelling on food packaging 
means the beginning of the end for GM crops. More probably, it will 
mark the end of the beginning — ifit prises out a fresh approach from 
the scientific community and the agricultural biotechnology industry 
to come clean with the public on what they’re doing. m 


Colin Macilwain writes about science policy from Edinburgh, UK. 
e-mail: cfmworldview@googlemail.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


A low-power light 
amplifier 


A semiconductor device can 
amplify the tiny signal from 
incoming photons using 
much less power and creating 
much less noise than current 
methods. 

Previously, converting a 
signal from photons into 
a usable electrical signal 
required the use of two 
devices running at relatively 
high voltages — one device 
to convert the photon to an 
electrical signal and another to 
amplify it. 

Yu-Hwa Lo at the University 
of California, San Diego, 
and his colleagues exploited 
a different amplification 
mechanism to consolidate the 
process into a single device. By 
engineering a special junction 
between layers of silicon 
with two different kinds of 
impurities, the team amplified 
the light signals by more than 
a factor of 4,000 and induced 
30 times less noise in the signal 
than conventional methods. 
Appl. Phys. Lett. 107,053505 
(2015) 


Pig-farming 
history traced 


Domesticated pigs (pictured) 

routinely interbred with wild 

boars — contrary to common 

assumptions that humans kept 

their animals isolated. 
Humans domesticated 

pigs from wild boars 


Finding a limit for deep-sea fishing 


The negative ecological impact of trawling 
for fish at depths of more than 600 metres 
outweighs the commercial benefits. 
Deep-sea fish are particularly vulnerable to 
overfishing because populations grow slowly. 
This has led to calls for a maximum depth for 
trawling, but it has not been clear what that 
limit should be. By examining species from 
scientific surveys of the northeast Atlantic 


independently in Anatolia 
(modern-day Turkey) and 
East Asia around 9,000 years 
ago. To learn about pig- 
population histories, a team 
led by Laurent Frantz at 

the University of Oxford, 
UK, analysed the genomes 
of more than 600 modern 
pigs and wild boars. After 
initial domestication in 
Anatolia, the ancestors of 
European pigs interbred 
with at least two different 
populations of wild boars 
that ranged between Europe 
and Anatolia. Pigs from East 
Asia seem to have interbred 
with local boars too. Despite 
this wild mixing, domestic 
pig genomes show signs of 
positive selection at regions 
that include genes involved in 
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behaviour and anatomy. 

The researchers propose 
that ancient herders repeatedly 
selected pigs with useful traits, 
counteracting the effects of the 
wild boar genes. 

Nature Genet. http://dx.doi. 
org/10.1038/ng.3394 (2015) 


ECOLOGY 


Coral foe becomes 
a friend 


Seaweed often inhibits the 
growth of corals, but it can 
help them when they are faced 
with a coral-eating starfish. 
Seaweed can suppress coral 
growth by shading it from 
sunlight and by releasing toxic 
chemicals. Cody Clements 
and Mark Hay at the Georgia 


over 35 years, Jo Clarke at the University of 
Glasgow, UK, and her colleagues found that the 
proportion of caught fish with no commercial 
value increased significantly below 600 metres. 
Although this evidence supports a 600- 
metre depth limit in the northeast Atlantic, its 
relevance to other fishing areas is untested. 
Curr. Biol. http://doi.org/66n (2015) 
For more on this story, see go.nature.com/yjedl3 


Institute of Technology in 
Atlanta surrounded branches 
ofa coral species (Montipora 
hispida) near Fiji with 

varying numbers of fronds 
ofa common brown alga 
(Sargassum polycystum). After 
4 months, they found that the 
growth rate of coral branches 
unencumbered by the seaweed 
was 2.7 times higher than 
corals with 8 surrounding 
fronds (the highest number 
tested). 

However, coral branches 
surrounded by four or more 
seaweed fronds were rarely 
attacked by the crown-of- 
thorns starfish (Acanthaster 
planci), which devoured all the 
exposed corals. 

Proc. R. Soc. B 282, 20150714 
(2015) 
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STEM CELLS 


How stem cells tell 
signal from noise 


Mouse stem cells switch 

on aneural developmental 
program when the activity ofa 
specific gene lasts for a certain 
length of time. 

Cells are flooded with many 
signals from gene expression. 
To find out how cells pick 
out the important ones from 
the background noise, Matt 
Thomson and his colleagues 
at the University of California, 
San Francisco, engineered. 
mouse embryonic stem cells so 
that the Brn2 gene turned on 
when it was exposed to light. 
When the activity of this gene 
reached a specific duration 
and level, the stem cells rapidly 
began specializing into neural 
progenitor cells. 

Mathematical modelling 
showed that a positive feedback 
network in the Brn2 circuitry 
helps to ensure that the Brn2 
signal rises above the noise. 

Cell Systems 1,117-129 (2015) 


METEOROLOGY 


Big coastal 
storms to come 


Three major coastal cities on 
different continents could get 
walloped by tropical cyclones 
during the next century 
because of climate change. 

Ning Lin of Princeton 
University in New Jersey 
and Kerry Emanuel of the 
Massachusetts Institute of 
Technology in Cambridge 
ran statistical models of how 
storms form near the cities of 
Dubai, Cairns in Australia and 
Tampa, Florida — all of which 
are vulnerable to rising sea 
levels. With climate change, 
storm surges could reach as 
high as 6 metres in Dubai 
during the next 100 years; the 
city has never experienced a 
tropical cyclone. Storm surges 
would be slightly lower in 
Cairns and Tampa, but still 
greater than the levels those 
cities have seen before. 

The authors calculate that 
by the end of the century, the 
annual probability of such 


powerful storms in Tampa 
would increase from about 

1 in 10,000 now to between 

1 in 2,500 and 1 in 700. 

Nature Clim. Change http://dx.doi. 
org/10.1038/nclimate2777 (2015) 


Cane toads wage 
chemical war 


Invasive toads in Australia 
could be turned against each 
other to control the population. 

Richard Shine at the 
University of Sydney in 
Australia and his colleagues 
grew cane toad (Rhinella 
marina) tadpoles and embryos 
together in containers in the 
laboratory, and separated 
them with a mesh partition. 
They found that the tadpoles 
suppressed embryo growth 
by 33-84% and reduced their 
survival to less than 5%. 

The tadpoles seem to 
produce a chemical that blocks 
embryo growth, allowing them 
to outcompete the embryos. 
The practice of removing 
tadpoles from breeding ponds 
could actually boost the 
growth of embryos; instead, 
tadpoles could be kept in mesh 
containers in the pond to 
stymie the embryos’ growth, 
the authors say. 

J. Appl. Ecol. http://doi.org/635 
(2015) 


Genetic switch 
stores up fat 


A variation in a genetic region 
associated with obesity causes 
fat to be stored rather than 
burned. 

Melina Claussnitzer at 
the Beth Israel Deaconess 
Medical Center in Boston, 
Massachusetts, Manolis Kellis 
at the Massachusetts Institute 
of Technology in Cambridge 
and their colleagues studied 
fat cells from 52 people with 
aversion of the FTO genetic 
region that is associated with 
obesity and from 48 people 
with the non-risk version. The 
team found a single-nucleotide 
change in the risk-associated 
FTO region that boosted 


RESEARCH HIGHLIGHTS 


THIS WEEK 


Popular topics 


on social media | 


Lifetime collaborators reap benefits 


Few researchers can do science single-handedly, making 
collaborations crucial. According to an analysis in Proceedings 
of the National Academy of Sciences, long-term collaborations 
pay big dividends, yielding a 17% boost in citation rate for 
resulting papers. Using a metric based on publications between 
collaborators over time, the analysis identified a group of strong 
partnerships, or ‘super ties, that produced an unusually high 
number of papers in a given period. Papers from authors with 
such ties receive an average of 21 more citations in biology 
and 8 more citations in physics than those without a super tie. 
The findings were well received by observers on social media. 
“Collaboration works. Pick your collaborators carefully, they 
can stay with you fora long time; tweeted Wouter Gerritsma, 
an information specialist at Vrije University in Amsterdam. 
With those rewards in mind, Jeremy Borniger, a neuroscience 
PhD student at Ohio State University in 


> NATURE.COM 
For more on 
popular papers: 


go.nature.com/hkiong (2015) 


expression of the genes IRX3 
and IRX5, which decreased 
the amount of energy burned 
and dissipated as heat. In fat 
precursor cells, this change 
resulted in the development 
of more energy-storing white 
fat cells and fewer energy- 
burning beige fat cells. 
Inhibiting Irx3 in mice 
caused the animals to lose 
weight without a change in 
appetite or exercise. 
N. Engl. J. Med. http://doi.org/6z5 
(2015) 


Apes get by in 
degraded habitat 


Endangered chimpanzees 
could be adapting to 
landscapes that have been 
broken up by human activity. 
Eastern chimpanzees (Pan 
troglodytes schweinfurthii) 
were thought to exist in 
low numbers in an area of 
Uganda where forest has been 
fragmented by farms and 
villages (pictured). Maureen 
McCarthy and Linda Vigilant 
of the Max Planck Institute for 
Evolutionary Anthropology 
in Leipzig, Germany, and 
their colleagues collected and 


Columbus, used Twitter to ask: “Who 
wants to be my science life partner?” 
Proc. Natl Acad. Sci. USA 112, E4671-E4680 


genetically analysed hundreds 
of chimp droppings from 
about 630 square kilometres 
of fragmented habitat. 

They estimate that some 

260 chimps in 9 communities 
live in about 1,200 square 
kilometres of what seems 

to be marginal habitat at 

best — more than 3 times the 
number of chimps that were 
previously estimated to live in 
this habitat. 

The authors suggest that 
these and other rare species 
might be more adaptable — at 
least in the short term — than 
was thought. 

BMC Ecol. http://doi.org/66q 
(2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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SEVEN DAYS nescnnss 


Explosion arrests 


The Chinese authorities 
have detained 12 people and 
are investigating another 11 
in relation to a warehouse 
explosion in Tianjin earlier 
this month, state media report. 
Among those under scrutiny 
are senior executives of Rui 
Hai International Logistics, 
which owns the warehouse, 
the head of the Tianjin 
Municipal Transportation 
Commission, the president 
of Tianjin Port and a senior 
official with the Chinese 
transport ministry. As of 

31 August, the disaster’s death 
toll was 158. Questions have 
arisen over the handling 

and storage of chemicals 
implicated in the explosions. 


Iran-science uplift 
Iran plans to boost 
international science 
collaboration once sanctions 
are lifted, Mohammad Farhadi, 
the country’s science minister, 
said on 27 August. Speaking to 
Irar’s Islamic Republic News 
Agency, Farhadi reported 

that preparations are under 
way to increase the country’s 
cooperation with foreign 
universities, including the 
development of academic 
exchange programmes and 

a visit from an Austrian 
university delegation. 
Sanctions have hampered 
scientists’ movement in and out 
of Iran, as well as the country’s 
involvement in international 
projects, but will end once a 
nuclear-deal agreement made 
on 14 July is implemented. 


PS FUNDING 
Australian merger 


The digital-research arm of 
Australia’s Commonwealth 
Scientific and Industrial 
Research Organisation 
(CSIRO) will merge with 
National Information 


Slimy sea creature rears head once more 


Thirty years after its last sighting, a rare species 
of nautilus (Allonautilus scrobiculatus) has been 
spotted near Papua New Guinea. A type of 
cephalopod, the species is distantly related to 
squid and octopuses, and was first discovered 
in the 1980s by Peter Ward of the University of 


Communications Technology 
Australia (NICTA) to form 

a CSIRO digital-innovation 
team, called Data61. 
Announced on 28 August, the 
merger follows the Australian 
government's decision to 

halt NICTAs funding after 
June 2016. Funds for Data61 
will come from CSIRO’s 
already-stretched budget, 
itself subjected in 2014 to 

cuts of Aus$115 million 
(US$82 million) over four 
years. The merger could 
result in the loss of as many 

as 200 jobs. 


Grant cash rejected 


The University of Florida in 
Gainesville announced on 
27 August that a US$25,000 
grant from agriculture 
giant Monsanto, originally 
earmarked for a science 
outreach programme, will 
instead be given to the 
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campus food bank. The 
statement comes after Florida 
plant scientist Kevin Folta 
faced public threats over his 
acceptance of the money, 
which Nature first reported on 
6 August (see Nature http:// 
doi.org/66p; 2015). Monsanto 
refused to accept the 
university's offer to return the 
money. There is no suggestion 
of wrongdoing or scientific 
misconduct by Folta. 


Bear-brain clues 
Knut, the celebrity polar 

bear hand-reared at the 
Berlin Zoological Garden, 
had an autoimmune brain 
disease, according to the latest 
investigation into his death. 
Knut drowned in 2011, aged 4, 
after suffering an epileptic 

fit and falling into a pool. An 
autopsy at the time blamed 


Washington in Seattle and Bruce Saunders from 
Bryn Mawr College, Pennsylvania. On 25 August, 
the University of Washington announced that 
Ward had again glimpsed the species, with its 
distinctive, ‘slimy’ shell covering (pictured), while 
visiting Papua New Guinea in July. 


an unspecified encephalitis 
— brain inflammation. 

An analysis published on 

27 August found antibodies 
that signify anti- NMDA 
receptor encephalitis, in 
which the immune system 
harms nerve cells (H. Priiss 
et al. Sci. Rep. 5, 12805; 2015). 
The treatable disease was first 
reported in humans in 2007, 
but was unknown in animals. 
Knut’ case suggests that 
autoimmune brain diseases 
may be more common in 
mammals than was thought. 


NASA's icy choice 
NASA has chosen New 
Horizons’ next probable target: 
an icy body called 2014 MU69. 
The spacecraft will fly by the 
object in 2019, making it the 
mission's second destination 
after the historic encounter 
with Pluto in July. Kuiper belt 
object 2014 MU69 is about 


PETER WARD 


SARA KRULWICH/NY TIMES/REDUX/EYEVINE 


SOURCE: ANN. GLACIOL. 


45 kilometres across, and 

New Horizons will fly within 
12,000 kilometres ofits surface. 
According to the 28 August 
announcement, the spacecraft 
will ignite its engines in a series 
of four burns, beginning in 

late October this year, to set 
itself on course for the fly-by. 
See go.nature.com/ojobwg for 
more. 


E-waste woes 

Only 35% of Europe’s annual 
9.5 million tonnes of electrical 
and electronic waste is legally 
disposed of, according toa 
study funded by the European 
Union and released on 

30 August. The remaining 
65% is either exported, 
recycled under non-compliant 
conditions, scavenged for 

the valuable elements in the 
waste or thrown away with the 
ordinary rubbish. Electronic 
waste contains toxic metals 
including mercury, cadmium 
and chromium, as well as 
valuable materials that can be 
reused, including gold, silver, 
palladium and rare-earth 
metals. The report offers 
detailed recommendations, 
with particular emphasis on 
educating consumers. 


Oliver Sacks dies 


Neurologist and author 
Oliver Sacks (pictured) died 
at his home in Manhattan 
on 30 August, aged 82. Sacks 


worked at several clinics in 


New York, and much of his 
writing revolved around 

the curious cases that he 
encountered. He found 
worldwide fame when his 
1973 book Awakenings — 
which described how he 
roused encephalitis patients 
from a coma-like state with 
the Parkinson's disease drug 
L-dopa — was made into a 
film in 1990. Last month, 

he created the Oliver Sacks 
Foundation to promote 
understanding of the human 
brain through narrative 
non-fiction like his own. 


Misconduct ruling 


Surgeon Paolo Macchiarini, 
who is famous for implanting 
synthetic tracheas into 
humans, was cleared of 
scientific misconduct 

charges on 28 August by 

the Karolinska Institute in 
Stockholm, where Macchiarini 
is a visiting professor. The 


university’s vice-chancellor 
overruled a finding by an 
independent investigator 

that seven papers contained 
unsupported assertions of the 
success of artificial grafts. The 
decision to overrule was based 
on more than 1,000 pages of 
documents that Macchiarini 
and his co-authors submitted 
after the independent report 
was released. See go.nature. 
com/ynxom8 for more. 


| _BUSINESS 
Monsanto backs off 


The world’s largest seller 

of seeds, Monsanto, 
announced on 26 August 

that it has abandoned its 
US$4.65-billion takeover 
attempt of Swiss competitor 
Syngenta after months of 
discussions. Executives of 
pesticide specialist Syngenta 
rejected multiple offers, 
despite pressure to negotiate 

a deal from some of their 
shareholders. The proposed 
merger was opposed by 
farmers unions, who said 
that the move would reduce 
competition and push up seed 
and pesticide prices at a time 
when farmers’ profits are under 
pressure from low food prices. 


Sanofi Googled 


The technology giant Google, 
based in Mountain View, 
California, announced 

on 31 August that it will 
collaborate with Paris-based 


ASIAN MOUNTAINS’ TERRIBLE TOLL 


TREND WATCH 


Poor decision-making, improper 
camp placement and lack of good 
forecasting were to blame for 
75% of avalanche fatalities on the 
world’s highest mountains, says 

a report from the University of 
British Columbia in Vancouver, 
Canada (D. M. McClung Ann. 
Glaciol. http://doi.org/66c; 

2015). Analysis of 10,000 
mountaineering reports for Asian 
mountains during 1895-2014 
found that 300 people died in 
avalanches while trying to climb 
peaks higher than 8,000 metres, 
one-third of the fatalities. 


Camp placement in high-risk areas and lack of weather forecasting are 
behind 75% of avalanche deaths on peaks higher than 8,000 metres. 


Everest (8,850 m) 

K2 (8,611 m) 
Kangchenjunga (8,586 m) 
Lhotse (8,516 m) 

Makalu (8,463 m) 

Cho Oyu (8,201 m) 
Dhaulagiri (8,167 m) 
Manaslu (8,163 m) 
Nanga Parbat (8,125 m) 
Annapurna (8,091 m) 
Gasherbrum | (8,068 m) 
Broad Peak (8,047 m) 
Gasherbrum II (8,035 m) 
Shisha Pangma (8,013 m) 
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SEVEN DAYS | THIS WEEK | 


3-6 SEPTEMBER 
Neurologists gather at 
the World Congress on 
Neuro Therapeutics: 
Dilemmas, Debates, 
Discussions, in Prague. 
go.nature.com/zmysnp 


5-6 SEPTEMBER 
Robotics engineers 
battle it out to find who 
has the best rover, at 

the European Rover 
Challenge near Checiny, 
Poland. 
roverchallenge.eu/en 


5-8 SEPTEMBER 
Molecular biologists 
convene at EMBO’s 
6th meeting, in 
Birmingham, UK. 
the-embo-meeting.org 


drugmaker Sanofi on ways 

to better monitor and treat 
people with diabetes. Sanofi 

is the second pharmaceutical 
company that Google has 
partnered with. In July 2014, 

it revealed a licensing deal 
with Novartis, based in Basel, 
Switzerland, for Google's 
‘smart lens’ technology, which 
is designed to monitor glucose 
in tears. Financial terms of the 
Sanofi deal were not disclosed. 


Gene-therapy trial 

A first-of-its-kind clinical 

trial to test a treatment fora 
degenerative disease that causes 
blindness was given a green 
light by the US Food and Drug 
Administration on 24 August. 
The trial will test a treatment 
for retinitis pigmentosa devised 
by RetroSense Therapeutics 

of Ann Arbor, Michigan. The 
gene therapy will attempt 

to deliver a gene encoding a 
protein for light sensitivity 
called channelrhodopsin-2, 
which the firm hopes will make 
new light-sensing proteins in 
retinal cells. The trial will start 
by the end of the year. 


> NATURE.COM 
For daily news updates see: 
WWww.nature.com/news 
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The New Horizons craft photographed Pluto’s atmosphere, backlit by the Sun, as the probe sailed away from the dwarf planet in mid-July. 


Pluto pressure data pose an 
atmospheric conundrum 


Discrepancy arises between New Horizons and Earth-based measurements. 


BY ALEXANDRA WITZE 


ASAss New Horizons spacecraft solved 

| \ | many mysteries about Pluto when it 
flew past the dwarf planet in July. But 

as mission controllers prepare to steer the 
probe to its next rendezvous, planetary sci- 
entists are working to understand a puzzling 


result: an atmospheric pressure at Pluto’s 
surface that is much lower than indicated by 


measurements obtained from Earth. 

Some have suggested that Pluto’s atmos- 
pheric pressure is dropping as the dwarf 
planet's orbit carries it farther from the Sun and 
gases freeze out and fall to the surface as snow. 
But the most recent data taken from Earth 
suggest no such dramatic transformation. 
“T feel pretty secure that Pluto isn't starting 
to freeze out,” says Eliot Young, a planetary 
scientist at the Southwest Research Institute 
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(SwRI) in Boulder, Colorado. 

On 29 June, a few weeks before the 
fly-by, Young organized astronomers across 
New Zealand and Australia to watch Pluto 
as it passed in front of a distant star. Tracking 
how the star’s light faded during the passage 
provided information on how much gas is in 
Pluto’s atmosphere. Using the same method, 
planetary scientists have seen the atmosphere 
grow denser since 1988 — and analysis of 
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> the 29 June observations shows that the 
trend remains intact. Young calculates that 
the current atmospheric pressure at Pluto's 
surface is 22 microbars (0.022 pascals), or 
22-millionths the pressure at sea level on 
Earth. 

But on 14 July, New Horizons measured 
Pluto’s surface pressure as much lower 
than that — just 5 microbars. “How we 
link the two, we're still working on,” says 
Cathy Olkin, a deputy project scientist for 
New Horizons at SwRI. 

Part of the discrepancy between the 
spacecraft’s observation and past estimates 
could be due to the indirect way that 
astronomers derive the value from Earth- 
based observations. These studies measure 
pressure some 50-75 kilometres above the 
dwarf planet’s surface, and researchers use 
assumptions about the atmosphere’s struc- 
ture to calculate what that number translates 
to at the ground. 

By contrast, New Horizons measured 
surface pressure directly by determin- 
ing how strongly radio waves, beamed 
from antennas on 


Earth, bentasthey «yop may be 
passed through — Jggking at the 
Pluto's atmos- first test of these 
phere and arrived models. not an 

at the spacecraft tm he i 

on the far side of “mospheric 


” 
the dwarf planet. collapse. 


The next chal- 
lenge is to figure out which of several 
competing models that describe Pluto's 
atmosphere can best reconcile the Earth- 
based measurements and what New 
Horizons measured at the surface. 

“We may be looking at the first test of 
these models, not an atmospheric collapse 
or some spectacularly freaky physics,” 
says Ivan Linscott, a physicist at Stanford 
University in California and co-leader of 
the New Horizons radio measurement. 
“The jury’s still out” 

Clues may yet come from New Horizons. 
About 95% of the data collected in its Pluto 
fly-by, including much of the informa- 
tion from the radio measurement, is still 
on board. Slow transmission speeds mean 
that the team will have to wait months for 
the rest of it to arrive. The transmission 
of images, which has been on pause since 
soon after the 14 July fly-by, will resume on 
5 September. 

And in late October, mission controllers 
will ignite the spacecraft’s engines in a 
series of burns to set it on course for its next 
destination: an object called 2014 MU69, 
which is about 45 kilometres across and 
lies in the Kuiper belt, a collection of 
small bodies orbiting beyond Neptune. 
New Horizons is set to pass within about 
12,000 kilometres of the object on New 
Year's Day 2019. m 
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QUANTUM PHYSICS 


Toughest test yet for 
quantum ‘spookiness’ 


Experiment plugs loopholes in previous demonstrations of 
‘action at a distance’ and could make data encryption safer. 


BY ZEEYA MERALI 


hackers. Physicists say that they have made 

the most rigorous demonstration yet of the 
quantum ‘spooky action at a distance effect 
that the German physicist famously hated — in 
which manipulating one object instantaneously 
seems to affect another one far away. 

The experiment could be the final nail in 
the coffin for theories that are more intuitive 
than standard quantum mechanics. It could 
also enable engineers to develop a new suite 
of ultrasecure cryptographic devices. “From a 
fundamental point of view, this is truly history- 
making,” says Nicolas Gisin, a quantum physi- 
cist at the University of Geneva in Switzerland. 

In quantum mechanics, objects can be in 
multiple states simultaneously: an atom can be 
in two places at once, for example. Measuring 
an object forces it to snap into a well-defined 
state. The properties of different objects also 
can become ‘entangled, meaning that when 
one such object is measured, the state of its 
entangled twin also becomes set. 

This idea galled Einstein because it seemed 
that this ghostly influence would travel instan- 
taneously — contravening the universal rule 
that nothing can travel faster than the speed 


I: abad day both for Albert Einstein and for 


— ” 


John Bell devised a test to show that nature does 
not ‘hide variables’ as Einstein had proposed. 


2015 
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of light. He proposed that quantum particles 
do have set properties, called hidden vari- 
ables, before they are measured, and that even 
though those variables cannot be accessed they 
pre-program entangled particles to behave in 
correlated ways. 

In the 1960s, physicist John Bell proposed a 
test that could discriminate between Einstein's 
hidden variables and spooky action at a 
distance’. He calculated that hidden variables 
can explain correlations only up to some maxi- 
mum limit. If that level is exceeded, then Ein- 
stein’s model must be wrong. 

The first experiment suggesting that this was 
the case was carried out in 1981 (ref. 2). Many 
more have been performed since, always com- 
ing down on the side of spookiness — but each 
has had loopholes that meant that physicists 
have never been able to fully close the door on 
Einstein's view. Experiments that use entangled 
photons are prone to the ‘detection loophole’: 
notall photons produced in the experiment are 
detected, and sometimes as many as 80% are 
lost. Experimenters therefore have to assume 
that the photons they capture are representa- 
tive of the entire set. 

To get around the detection loophole, physi- 
cists often use particles that are easier to keep 
track of than are photons, such as atoms. But 
it is tough to place atoms far apart without 
destroying their entanglement. This opens 
the ‘communication loophole’: if the entangled 
atoms are too close together, then, in princi- 
ple, measurements made on one could affect 
the other without violating the speed-of-light 
limit. 


ENTANGLEMENT SWAPPING 

In the latest paper’, which was submitted to 
the arXiv preprint repository on 24 August 
and has not yet been peer reviewed, Ronald 
Hanson of Delft University of Technology 
and his colleagues report the first Bell experi- 
ment that closes both the detection and the 
communication loopholes. The team used 
a cunning technique called entanglement 
swapping to combine the benefits of using 
both light and matter. The researchers started 
with two unentangled electrons sitting in dia- 
mond crystals in different labs on the Delft 
campus, 1.3 kilometres apart. Each electron 
was individually entangled with a photon, 


CERN 
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and both those photons were then zipped 
to a third location. There, the two photons 
were entangled with each other — and 
this caused both their partner electrons to 
become entangled, too. 

This did not work every time. In total, the 
team managed to generate 245 entangled 
pairs of electrons over the course of nine 
days. The team’s measurements exceeded 
Bell’s bound, once again supporting the 
standard quantum view. Moreover, the 
experiment closed both loopholes at once: 
because the electrons were easy to monitor, 
the detection loophole was not an issue, and 
they were separated by enough distance to 
also close the communication loophole. 

“Tt is a truly ingenious and beauti- 
ful experiment,” says Anton Zeilinger, a 
physicist at the Vienna Centre for Quantum 
Science and Technology. 

Matthew Leifer, a quantum physicist 
at the Perimeter Institute for Theoretical 
Physics in Waterloo, Canada, says that he 
would not be surprised to see one of the 
authors of the paper share a Nobel prize in 
the next few years. “It’s that exciting” 

A loophole-free Bell test also has 
implications for quantum cryptography, 
says Leifer. Companies already sell sys- 
tems that use quantum mechanics to block 
eavesdroppers. The systems produce entan- 
gled pairs of photons, sending one photon 
in each pair to one user and the other pho- 
ton to a second user. The two users then 
turn these photons into a cryptographic key 
that only they know. 

But loopholes — and the detection 
loophole in particular — mean that mali- 
cious companies could sell devices that 
fool users into thinking that they are get- 
ting quantum-entangled particles, when 
they are instead being given keys that the 
company can use to spy on them. In 1991, 
quantum physicist Artur Ekert observed* 
that integrating a Bell test into the system 
would ensure a genuine quantum process. 
For this to be valid, however, the Bell test 
must be free of any loopholes. The Delft 
experiment “is the final proof that quan- 
tum cryptography can be unconditionally 
secure’, says Zeilinger. 

In practice, the technique will be hard to 
implement, because so far it has generated 
entangled electrons at a very slow pace. 

Zeilinger also notes that there remains 
a last, somewhat philosophical, loophole, 
first identified by Bell himself: the possi- 
bility that hidden variables could somehow 
manipulate the experimenters’ choices of 
what properties to measure, tricking them 
into thinking quantum theory is correct. = 
1. Bell, J. S. Physics 1, 195-200 (1964). 

2. Aspect, A., Grangier, P. & Roger, G. Phys. Rev. 

Lett. 49, 91-94 (1982). 

. Hensen, B. et al. Preprint available at http:// 


3 
arxiv.org/abs/1508.05949 (2015). 
4. Ekert, A. Phys. Rev. Lett. 67, 661-663 (1991). 
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The next-generation synchrotron at Lund in Sweden has passed its first test. 


X-ray science 
gets an upgrade 


Swedish synchrotron promises super -bright beams and will 
open up new avenues for researchers. 


BY DAVIDE CASTELVECCHI 


anew era for X-ray science, electrons have 

begun circulating in a next-generation 
synchrotron in Lund, Sweden. This machine 
promises to lower the costs of X-ray-light 
sources around the world, while improving 
their performance and enabling experiments 
that were not possible before. 

Synchrotrons are particle accelerators that 
produce X-rays that are used in research rang- 
ing from structural biology to materials sci- 
ence. At 10 p.m. local time on 25 August, the 
first bunches of electrons began circulating 
inside a new 528-metre-long, 3-gigaelectron- 
volt (GeV) machine at the MAX IV facility in 
Lund, project director Christoph Quitmann 
told Nature. MAX IV is the first ‘fourth-gen- 
eration synchrotron in the world. 

Getting the first beam is an absolutely 
crucial first step” in demonstrating fourth- 
generation technology, says Chris Jacobsen, 
an X-ray physicist at the Argonne National 
Laboratory in Lemont, Illinois. MAX IV, he 
says, is “leading the world towards a new path 
in synchrotron light sources”. 


I: what researchers hope marks the start of 


In synchrotrons, bunches of electrons 
circulate at nearly the speed of light inside a 
ring-shaped vacuum tube. Powerful ‘bending’ 
magnets steer the electrons around the rings, 
and ‘focusing’ magnets push them together 
against their mutual repulsion. The electrons 
then pass through special magnets that shake 
them sideways to produce pulses of X-rays, 
known as synchrotron radiation. 

Fourth-generation light sources promise 
to squeeze the electrons into tighter bunches, 
leading to X-ray pulses that concentrate more 
photons into a tighter, brighter beam. This 
means that it will take just minutes for research- 
ers to do experiments that could take days ona 
third-generation machine, Jacobsen says. 


FOURTH GENERATION 
Eventually, beams from fourth-generation 
machines could enable materials scientists to 
observe chemical reactions inside a battery as 
they happen, or structural biologists to reveal 
the structure of proteins from smaller protein 
crystals than those needed at existing light 
sources. 

The crucial innovation in the fourth-gen- 
eration machines is to employ a narrower > 


3 SEPTEMBER 2015 | VOL 525 | NATURE | 15 


© 2015 Macmillan Publishers Limited. All rights reserved 


IN FOCUS 


vacuum pipe in which to circulate the 
electrons. In MAX IV’s case, the pipe is 
22 millimetres across, about half as wide 
as in a typical existing synchrotron. This 
makes it possible to get stronger mag- 
netic fields using more-compact bending 
and focusing magnets, which are also less 
expensive and can consume ten times less 
electricity than third-generation systems 
because of their smaller size. 

But keeping such a narrow pipe free of 
air would not have been possible using 
conventional high-vacuum pumps alone. 
MAX IV borrowed a technology from the 
Large Hadron Collider (LHC) at CERN, 
Europe’s particle-physics facility near 
Geneva, Switzerland, which circulates 
protons rather than electrons. The LHC’s 
trick — now adopted by MAX IV — is to 
coat the inner surface of the pipes with a 
special alloy that absorbs any gas molecules 
that happen to bounce around inside the 
tubes. 

“The Swedes should be very proud of 
their innovative fabrication techniques, 
which lower the cost of making these 
machines,’ says physicist Herman Winick, 
a veteran synchrotron builder at the SLAC 
National Accelerator Laboratory in Menlo 
Park, California. 

In the next few weeks, the MAX IV team 
will have to test whether they can circulate 
the large number of electrons that will be 
necessary to produce high-quality beams 
of X-rays, says Robert Hettel, an accelera- 
tor physicist at SLAC. And in subsequent 
months, they will build eight experimen- 
tal stations, or beamlines, around the 
synchrotron, which they plan to open 
on 21 June 2016, a date chosen for the 
symbolism of the summer solstice. 

The synchrotron that fired up on 
25 August is the larger of two that the MAX 
IV team is building; the smaller fourth-gen- 
eration machine will produce electrons of 
1.5GeV for making ‘softer; or less energetic, 
X-rays. The combined cost of the machines 
and of the first eight beamlines will be 
4.5 billion Swedish kronor (US$530 mil- 
lion), Quitmann says, which is being paid 
for by the Swedish government. 

Quitmann says that his team reached “a 
major milestone last night” But, he adds, 
“We have still a long way to go”. m 


The US Precision Medicine Initiative aims to collect health and genetic data from 1 million people. 


_ PERSONALIZED MEDICINE | 


Health study set to 
decide data policy 


Specialists are split over whether participants should have 
free access to their genetic information. 


BY SARA REARDON 


fter dozens of unsuccessful treatments, 
A Eric Dishman started to suspect that 

his illness was due to something other 
than the rare kidney cancer he was diag- 
nosed with in 1989. Five years ago, he had his 
whole genome sequenced, then gave the data 
to oncologists — and learned that he had a 
different type of cancer altogether. 

He was treated successfully, and remains 
cancer-free. “I was an early prototype for 
precision medicine,” he says. 

Dishman now leads the health and 
life-sciences division of microprocessor giant 
Intel in Banks, Oregon. He is also a member 
of a working group run by the US National 
Institutes of Health (NIH) for the Precision 
Medicine Initiative (PMI) — a US$215-million 
project to collect data on genomes, health 
records and physiological measurements from 


1 million participants, to learn how genetics, 
environment and lifestyle influence disease 
risk and the effectiveness of treatments. 

Next month, the group is expected to release 
a project plan. Observers are eager to learn its 
answer to a key question: how much informa- 
tion about disease risk, especially genetic data, 
will the project share with participants? 

That issue is the subject of much debate. 
Dishman and others say that participants 
should at least have the option to see all their 
personal data so that they can investigate their 
own health, just as he did. But some specialists 
in the field say that showing participants their 
data is irresponsible, because the information 
is challenging for people to interpret and its 
significance is often uncertain. 

Most genetic variants linked to disease 
increase risk only slightly, yet people who dis- 
cover that their genome holds such a variant 
might worry excessively or seek unnecessary 
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medical tests. Or they might do nothing: the 
limited research on how people react suggests 
that, far from causing panic, information about 
common variants of small-to-moderate effect 
does not seem to motivate people to make rec- 
ommended long-term behavioural changes to 
lessen risk. “Unless you give people the tools 
and the skills to deal with the raw data, I don't 
see how you could give them the raw data,’ says 
Brian Van Ness, a geneticist at the University of 
Minnesota in Minneapolis. 

Others counter that letting researchers 
choose what information to share with partici- 
pants is paternalistic. “I don’t know why were 
so afraid of the genome that we say it’s danger- 
ous for people to have this information,” says 
Sharon Terry, director of the Genetic Alliance 
in Washington DC. “All of us live with all kinds 
of uncertainty all the time.’ 

The NIH, together with US President Barack 
Obama, who officially launched the PMI in his 
January 2015 State of the Union speech, says that 
it wants participants to be active partners in the 
research. And given the programme’ large size 
and high profile, the working group's choices 
could establish a precedent for similar projects. 
“What's implemented in this cohort will be the 
future of research,” says NIH working-group co- 
chair Bray Patrick-Lake, who works in patient 
engagement in research at Duke University in 
Durham, North Carolina. 


Dishman, for his part, favours a tiered system 
in which each participant decides how much 
data to receive. Although he credits his eventual 
cancer cure to his analysis of his own genome, 
he says that not everyone has the education or 
the desire to interpret genetic information. If 
the PMI achieves its goal of recruiting a diverse 
population, Dishman says that it will probably 
include some par- 


ticipants who would “Allofus live 
rather not receive withallkinds 
their genetic data. of uncertainty 


“There areso many allthetime.” 
psychological, emo- 

tional and cultural responses, and you can't just 
create blanket policies,” he says. 

Patrick-Lake says that the group will 
probably set a single policy for the whole 
initiative, although discussing what that will be 
would be premature. 

If the working group does decide to give 
participants access to their data, it will have to 
address many issues related to the transaction. 
US law requires genetic data used for clinical 
decision-making to be of higher quality — and 
thus more costly to produce — than data used 
only for research. So sharing data could drive 
up costs. The project will also need to decide on 
questions such as whether to release data to the 
families of participants who have died. 

Historically, most studies have not given 
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raw data to individuals, especially if the find- 
ings do not reveal any information that par- 
ticipants could use to improve their health. 
But public opinion on data sharing seems 
to be shifting towards openness, says Susan 
Wolf, who studies public-health law and bio- 
ethics at the University of Minnesota. “We're 
past the era when scientists can simply take 
specimens, generate data with great health 
importance, and decline to offer any of that 
data back to people” 

The UK Biobank in Stockport, which is 
enrolling 500,000 people, collects genomic 
and health data but does not return individual 
results to participants. That was the simplest 
logical approach when the biobank was set up 
in 2007, says executive director Rory Collins, an 
epidemiologist at the University of Oxford, UK, 
who is a member of the NIH working group. “If 
the UK Biobank was set up now, it’s plausible 
that a different decision would be taken; he 
says. “I think the Precision Medicine Initiative 
may take a different decision.” m SEE EDITORIALP.S 


CORRECTION 

The News Feature ‘The cannabis experiment’ 
(Nature 524, 280-283; 2015) incorrectly 
located the University of Colorado School of 
Medicine in Denver. It is actually in Aurora. 
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BY MARK PEPLOW 


INSPIRED BY BIOLOGY, 
CHEMISTS HAVE CREATED A 
CORNUCOPIA OF MOLECULAR 


PARTS THAT ACT AS SWITCHES, 


MOTORS AND RATCHETS. NOW 
IT 1S TIME TO DO SOMETHING 
USEFUL WITH THEM. 


18 | NATURE | VOL 525 | 3 SEPTEMBER 2015 


_—  ——- i. -_ a ae. ee ee SS. 

a v ~ “ v » a v ¥ y ” y ¥y y v 

— wr ¥ = ¥ y y Y 5 4 ¥ 4 ’ 
¥ v ena => : > 


WA <x. 


THE MACHINES 


he robot moves slowly along its track, pausing regularly to reach out an arm that carefully 
scoops up a component. The arm connects the component to an elaborate construction 
on the robot’s back. Then the robot moves forward and repeats the process — systemati- 
cally stringing the parts together according to a precise design. 

It might be a scene from a high-tech factory — except that this assembly line is just a few 
nanometres long. The components are amino acids, the product is a small peptide and the robot, 
created by chemist David Leigh at the University of Manchester, UK, is one of the most complex 
molecular-scale machines ever devised. 

It is not alone. Leigh is part of a growing band of molecular architects who have been 
inspired to emulate the machine-like biological molecules found in living cells — kinesin 
proteins that stride along the cell’s microscopic scaffolding, or the ribosome that constructs 
proteins by reading genetic code. Over the past 25 years, these researchers have devised an 
impressive array of switches, ratchets, motors, rods, rings, propellers and more — molecular 
mechanisms that can be plugged together as if they were nanoscale Lego pieces. And progress 
is accelerating, thanks to improved analytical-chemistry tools and reactions that make it 

easier to build big organic molecules. 

Now the field has reached a turning point. “Weve made 50 or 60 different motors, says Ben 
Feringa, a chemist at the University of Groningen in the Netherlands. 

“Tm less interested in making another motor than actually using it” A molecular ‘nanocar’ 
That message was heard clearly in June, when one of the influen- __ travels across a metal 

tial US Gordon conferences focused for the first time on molecular surface, propelled by 

machines and their potential applications, a clear sign that the field _ bonding changes. 
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has come of age, says the meeting's organizer, 
chemist Rafal Klajn of the Weizmann Institute 
of Science in Rehovot, Israel. “In 15 years’ 
time,’ says Leigh, “I think they will be seen as 
acore part of chemistry and materials design” 

Getting there will not be easy. Researchers 
must learn how to make billions of molecular 
machines work in concert to produce measur- 
able macroscopic effects such as changing the 
shape of a material so that it acts as an artifi- 
cial muscle. They must also make the machines 
easier to control, and ensure that they can carry 
out countless operations without breaking. 

That is why many in the field do not expect 
the first applications to involve elaborate con- 
structs. Instead, they predict that the basic 
components of molecular machines will 
be used in diverse areas of science: as light- 
activated switches that can release targeted 
drugs, for example, or as smart materials that 
can store energy or expand and contract in 
response to light. That means that molecular 
architects need to reach out to researchers 
who work in fields that might benefit from 
their machine parts, says Klajn. “We need to 
convince them that these molecules are really 
exciting.” 


SHUTTLE LAUNCH 

Many of today’s molecular machines trace 
their origins to a relatively simple device built 
in 1991 by Fraser Stoddart, a chemist now at 
Northwestern University in Evanston, Illinois. 
It was an arrangement known as a rotaxane, 
in which a ring-shaped molecule is threaded 
onto an ‘axle’ a linear molecule capped by 
bulky stoppers at each end. Included in this 
particular axle, towards either end of the chain, 
were two chemical groups that could bind to 
the ring. Stoddart found' that the ring could 
hop back and forth between these two sites, 
creating the first molecular shuttle. 

By 1994, Stoddart had modified the design 
so that the axle had two different binding 
sites’. The shuttle existed in solution; chang- 
ing the acidity of this liquid forced the ring 
to hop from one site to the other, making 
the shuttle into a reversible switch. Similar 
molecular switches could one day be used 
in sensors that respond to heat, light or spe- 
cific chemicals, or that open the hatch of a 
nanoscale container to deliver a cargo of drug 
molecules at precisely the right time and to 
exactly the correct place in a person's body. 

Stoddart’s switches displayed two properties 
that would come up again and again in the 
molecular machines that followed. First, the 
links between the ring and the axle’s binding 
sites were not the strong covalent bonds that 
knit atoms into molecules. Instead, they were 
weaker electrostatic attractions between slightly 
positive and negative regions of the two com- 
ponents. This meant that the bonds could be 
readily formed and broken, much like zipping 
and unzipping the hydrogen bonds that link the 
two strands of DNA. Second, the shuttles did 


not need an external energy source to zip back 
and forth. They were powered by collisions with 
other molecules in the solution, a jostling effect 
called Brownian motion. 

A plethora of other switches soon followed. 
Some were controlled with light or changes in 
temperature, whereas others worked by bind- 
ing specific ions or molecules from solution, in 
a similar way to how ion channels work in cell 
membranes, opening or closing in response to 
chemical signals. 

Stoddart, however, took his research in a 
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function properly, he says, they will collectively 
and reliably encode data. 

Others have used rotaxanes to make switch- 
able catalysts. In 2012, Leigh described’ a sys- 
tem with a nitrogen atom in the middle of the 
rotaxane’s axle, where it is normally covered 
by aring. Add an acid, and the ring moves to 
one side, exposing the nitrogen atom so that it 
can catalyse a common chemical reaction. It 
goes further: last November, Leigh reported’ 
a rotaxane system with two different catalytic 
sites. Moving the ring from one to the other 


“T THINK THEY WILL BE SEEN AS A CORE PART 
OF CHEMISTRY AND MATERIALS DESIGN. 


different direction. Working with James Heath 
at the California Institute of Technology in 
Pasadena, he used millions of rotaxanes to 
make a memory device’. Sandwiched between 
silicon and titanium electrodes, the rotax- 
anes could be electrically switched from one 
state to another and used to record data. This 
molecular abacus, roughly 13 micrometres 
across, contained 160,000 bits, each com- 
posed of a few hundred rotaxanes — a density 
of roughly 100 gigabits per square centime- 
tre, comparable to the best commercial hard 
drives available today. 

Using 24 of the best-performing bits, 
Stoddart’s team stored and retrieved the 
letters ‘CIT’ (for the California Institute 
of Technology). But the switches were not 
very robust, typically falling apart after fewer 
than 100 cycles. One promising solution is to 
load them into tough, porous crystals known 
as metal-organic frameworks (MOFs), 
which protect the switches and organize 
them into a precise 3D array (see Nature 520, 
148-150; 2015). 

Earlier this year, Robert Schurko and 
Stephen Loeb of the University of Windsor, 
Canada, showed that they could pack about 
10” molecular shuttles into each cubic centi- 
metre of a MOF’. And last month, Stoddart 
unveiled® a different MOF that contained 
switchable rotaxanes. The MOF was mounted 
on an electrode, and the rotaxanes could be 
switched en masse by changing the voltage. 

Researchers working on these MOFs hope 
that the 3D, solid scaffolds will offer a greater 
density of switches than conventional silicon 
transistors, and make the molecules easier to 
switch in a controllable way, potentially offer- 
ing vast amounts of data storage. “The sci-fi 
way to think about it would be to address each 
molecule as a bit,” says Loeb. But more realis- 
tically, he says, a speck of the MOF contain- 
ing hundreds of switches could act as one bit. 
As long as most of the switches in the speck 


allowed the chemists to switch the rotax- 
ane’s activity, so that it could stitch together 
a mixture of molecules in two different ways. 
Leigh is now working on putting several dif- 
ferent switchable catalysts into the same solu- 
tion, where they could be toggled on and off 
in a sequence to build target molecules into 
complex products, in much the same way as 
enzymes do in a cell. 


NANO MOTORS 

In 1999, after early experiments with shuttles 
and switches, the field took a big step forward 
with the creation of the first synthetic molecular 
motor*. Built by Feringa’s team, it was a single 
molecule containing two identical ‘paddle’ units 
connected by a carbon-carbon double bond. 
This fixed the paddles in place until a burst of 
light broke part of the bond, allowing the pad- 
dles to rotate. Crucially, the shape of the paddles 
meant that they could turn in only one direction 
—andas long as there wasa supply of light and 
some heat, the motor would just keep spinning. 

Feringa went on to use similar molecular 
motors to create a four-wheel-drive ‘nano- 
car”. He also showed" that the motors could 
give liquid crystals enough of a twist to slowly 
rotate a glass rod sitting on top of them. The 
rod was 28 micrometres long — thousands of 
times the size of the motors. 

Some chemists argue that although these 
motors are cute, they are ultimately useless 
by themselves. “I’ve always been a bit scepti- 
cal of artificial motors — they’re too difficult 
to make, too difficult to scale up,” says Dirk 
Trauner, a chemist at the Ludwig Maximilian 
University in Munich, Germany. 

But the chemical principles behind them 
might be very useful indeed. Using the same 
light-activated mechanism, researchers have 
developed around 100 drug-like compounds 
that can be switched on or off in response 
to light. 

In July, for example, a team led by Trauner 
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reported” a light-switchable version of 
combretastatin A-4, a potent anticancer 
compound that comes with some serious side 
effects, because it indiscriminately attacks 
tumour cells and healthy tissue alike. The 
team’s switchable drug could drastically reduce 
system-wide side effects: it contains a nitrogen— 
nitrogen double bond that holds two sections of 
the molecule apart and renders it inactive. Only 
under blue light will the bond break and allow 
the sections to rotate into the molecule’s active 
form. Trauner says that an area of tissue just 10 
micrometres wide can be specifically targeted 
in this way, using light delivered through a flex- 
ible tube or by an implanted device. Trauner is 
planning mouse studies to test the compound's 
effectiveness against cancer. 

He also hopes to use photoswitchable 
compounds to restore vision in people with 
macular degeneration or retinitis pigmentosa, 
conditions that damage the eye's light-sensing 
rod and cone cells. “It’s low-hanging fruit 
— because it’s in the eye, you don’t have to 
worry about how to get the light in,” he says. 
Last year, he showed” that one injection of 
a photoswitchable molecule called DENAQ 
into the eyes of blind mice partially restored 
their vision for several days, allowing the ani- 
mals to distinguish between light and dark. 
The team is now trying the same technique 
in primates, and hopes to begin human trials 
in two years’ time. 

Trauner and Klajn both agree that the main 
challenge will be to convince the cautious 
pharmaceutical industry that photoswitchable 
drugs have potential, even though they have no 
track record in humans. “We need to get the 
pharmaceutical industry excited about photo- 
pharmacy,’ says Trauner. “Once they see the 
value, we'll be in good shape.” 


WALK THE LINE 

Long before any creature had evolved to move 
on dry land, cells were using legs as part of 
their cellular machinery. Prime examples are 
the two-pronged proteins called kinesins, 
which put one ‘foot’ in front of the other as 
they carry molecular cargo along the cell’s stiff 
scaffolding of microtubules. 

Inspired by kinesin, researchers have built 
artificial walkers from DNA. The molecules 
typically have feet that are anchored in place 
by binding to complementary strands of 
DNA laid out ona track; adding a competing 
DNA strand can free the foot, allowing it to 
take a step forward. One of the most striking 
examples was described" in 2010 by Nadrian 
Seeman at New York University. His DNA 
walker had four ‘feet’ and three ‘hands; with 
which it could pick up gold nanoparticles as 
it moved around a tile made of folded DNA. 

DNA walkers — and variants that soon 
trundled out of other labs — would wander 
aimlessly if they did not have a built-in ratchet 
system to stop them taking a step backward. 
For many walkers, that ratchet lies in the 
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A ring-shaped molecule threaded onto 
a linear molecule shifts between two 
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of the surrounding solution. 
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relative rates of the chemical reactions that are 
involved in binding and releasing their feet, 
with the pummelling of Brownian motion 
driving the released foot forward”. 

Over the past few years, detailed chemical 
studies and molecular dynamics simulations 
have shown that this ‘Brownian ratchet’ con- 
cept underlies all chemically driven molecular 
machines, including many biological motors. 
In 2013, for example, a team led by Nils Wal- 
ter, a chemical biologist at the University of 
Michigan in Ann Arbor, found’ the same 
mechanism at work in the spliceosome, a cel- 
lular machine that snips sections out of RNA 
before genetic information is translated to 
make proteins. “Kinesin uses it, the ribosome 
uses it and the spliceosome uses it,’ says Walter. 

That shows that the same principles underlie 
biological machines and synthetic molecular 
machines, so researchers working in the two 
areas could share knowledge. “By and large, 
they’re quite separate fields right now,’ says 
Walter. “I think the next breakthroughs will 
come if we all sit at the same table.” 


ROCKET SCIENCE 

Meanwhile, inspired by the microscopic medi- 
cal submarine of the cult 1966 film Fantastic 
Voyage, chemists have created an array of 
micrometre-sized particles and tubes that can 
zip through liquids like rockets. 

Some of these motors carry a catalyst that 
generates thrust by producing a stream of 
bubbles from the liquid around them — often 
hydrogen peroxide. Others get their power 
directly from light or from external electric 
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Mechanisms the size of molecules — governed by the rules 
of chemistry, rather than Newtonian mechanics — could 
have applications ranging from drug delivery to nanoscale 
computer memories. 


Molecular 
wheels 


Scaffold 


NANOCAR 


Electrons from a scanning tunnelling 
microscope tip (not shown) leap onto the 
molecules that form the ‘wheels’ of this 
device, causing them to change configuration, 
rotate and move the car forward. 


and magnetic fields, which can also be used to 
steer the vessels. “These nanomotors can go 
over 1,000 times their own length per second, 
it's incredible,’ says nanoengineer Joseph Wang 
of the University of California, San Diego. He 
thinks that the most promising applications lie 
in fast drug delivery, or low-cost clean-up of 
environmental pollutants — although many 
in the field caution that it is too early to tell 
whether nanomotors would trump conven- 
tional methods. 

Hydrogen peroxide, a powerful oxidizing 
agent, is hardly conducive to in vivo use. “When 
all the work was based on peroxide there was a 
lot of scepticism,” Wang admits. But in Decem- 
ber last year, he reported'® a microscale motor 
suitable for testing in live animals. Made of a 
plastic tube roughly 20 micrometres long, it 
contains a core of zinc that reacts with stomach 
acid to generate propulsive bubbles of hydrogen. 

The tubes safely zipped around inside a 
mouse’s stomach for about 10 minutes. Wang 
used them to carry gold nanoparticles into 
surrounding stomach tissue; mice dosed with 
plain nanoparticles ended up with three times 
less gold in their stomach lining than mice 
dosed with the tubes. 

Wang suggests that loading drugs or 
imaging compounds onto the rockets could 
deliver them into stomach tissue rapidly and 
effectively. “In the next five years we will move 
to practical in vivo applications,” he says. “It 
really is the fantastic voyage.” 

At the moment, there is limited crossover 
between research on these rockets and the 
molecular machines. “But we could bring 


MOLECULAR PUMP Chemical reactions in a surrounding solution drive 


two molecular rings into a holding area. 


Ring The ring molecule starts 
with four positive charges, 
and is repelled from the 


positive dumbbell. 


A reaction adds two 
electrons to the ring, and 
one electron to Area 1. 
Interactions between these 
unpaired electrons hold the 
ring in place. 


Area 2 


MOLECULAR ASSEMBLY LINE 


Amino 
acid 


7 N 


A ring moves along a linear 
molecule, using its nanoscale 
arm to pick up amino acids 
mounted along the shaft. 


a lot,” says Klajn. For example, coating a 
micromotor with light-responsive molecular 
switches could offer extra control over its 
movement, he suggests. 


PUMP IT UP 

In their quest to forge molecular machines that 
can actually do something useful, researchers 
are starting to integrate several different com- 
ponents into a single device. In May this year, 
Stoddart unveiled” an artificial molecular 
pump that pulls two ring molecules out of solu- 
tion onto a storage chain. Each ring slips over 
a stopper at one end of the chain, attracted to a 
switchable binding point. Flipping that switch 
pushes the ring over a second barrier farther 
along the chain, where it reaches a holding area 
(see ‘Nano machines’). 

The system is not able to pump any other 
type of molecule, and it took a lot of trial and 
error to build. “It’s been a long road,” sighs 
Stoddart. But it proves that molecular machines 
can be used to concentrate molecules, pushing 
a chemical system out of equilibrium in the 
same way that biology can build up a store of 
potential energy by forcing ions or molecules 
up a concentration gradient. “We're learning 
how to design an energy ratchet,’ he says. 

Stoddart says that such developments could 
enable the field to progress in two major direc- 
tions: stay nano, giving the machines molecu- 
lar-scale jobs that cannot be achieved in any 
other way; or go macro, using trillions of them 
together to reshape materials or move substan- 
tial cargoes, like an army of ants. 

Perhaps the prime example of the nano 
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approach is Leigh’s molecular assembly line’®. 
Inspired by the ribosome, it is based on a rotax- 
ane system that picks up amino acids from its 
axle and adds them to a growing peptide chain. 
But the devices could have macro applications. 
Over 36 hours, 10'* of them working together 
can produce a few milligrams of peptide. “It 
doesn’t do anything that you can't do in the lab 
in halfan hour,’ says Leigh. “Yet it shows that 
you can have a machine that moves down a 
track and picks up molecular building blocks 
and puts them together.’ Leigh is now work- 
ing on other versions of the machine to make 
sequenced polymers, with tailored material 
properties. 

Conversely, trillions of molecular machines 
working together could change the properties 
of materials in the macroscopic world. Gels 
that expand or contract in response to light or 
chemicals, for example, could act as adjustable 
lenses or sensors. “In the next five years, I bet 
you'll get the first smart materials where you 
have switches incorporated,’ says Feringa. 

Rotaxane-like molecules are already 
starting to see commercial applications. The 
Nissan Scratch Shield iPhone case, launched 
in 2012 and based on work by Kohzo Ito at 
the University of Tokyo, is made of polymer 
strands threaded through pairs of barrel- 
shaped cyclodextrin molecules connected in 
a figure-of-eight shape. Pressure on a normal 
polymer coating would break the connections 
between the chains, leaving a scratch. But the 
cyclodextrin rings act like the wheels of a pul- 
ley system, allowing the polymer strands to 
slip through without breaking”. The films can 


even protect a brittle screen from a sustained 
beating with a hammer. 

For Stoddart, this shows that the compo- 
nents developed by molecular architects are 
already ripe for application. “This field has 
come a long way,’ says Stoddart. “Now we have 
to start showing it’s useful’= 


Mark Peplow is a science journalist based in 
Cambridge, UK. 
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THE TROUBLE 
WITH WEARABLES 


Electronic gadgets on — and in — our bodies are multiplying fast, 
but transmitting all their data safely will be a challenge. 


BY KAT AUSTEN 


om is late for his train and doesn't know the way to the station. 
Racing around a corner, he runs into a plaza full of tourists snap- 
ping and uploading photos to Instagram and Facebook. Which 
way should he go? He tells his Internet-connected contact lenses 
to load a map, meanwhile tapping at his smartwatch to pull up his 
ticket and platform information. An alarm flashes in his peripheral 
vision, only 15 minutes until the train departs, but the map is not load- 
ing. He looks around in dismay, frantically yelling “refresh” to his lenses 
against the clamour of the street. An alert scrolls across his vision: “You're 
feeling stressed. Take a breath. Have a hug!” But with all the tourists 
accessing the Internet, Tom has no hope of getting his much-needed map. 

Welcome to the chaotic future of wearable electronics: devices that 
promise to connect real to digital lives seamlessly. These gadgets are rap- 
idly multiplying, and within five years there could be half a billion devices 
strapped onto, or even embedded in, human bodies. Today, the most 
familiar gadgets are fitness trackers and smart watches, which monitor 
health and provide ready access to online services. But there are already 
devices that claim to do more than monitor, such as headbands that alert 
wearers when they become distracted or wristbands that administer elec- 
tric shocks to smokers who want help quitting. Electronics companies 
promise to transform medicine with wearables that can treat symptoms 
or manage care. Devices are emerging that alert people with epilepsy to 
incipient seizures, help prevent anxiety attacks, and enable blind people 
to navigate. 

But the potential of wearables crucially depends on the large amounts 
of data they access and generate. And that leads to two problems that 
researchers and technology developers are struggling to solve: finding 
improved ways to transmit data to and from wearables, and keeping all 
that information safe. With everything from toasters to cars now con- 
necting wirelessly to the Internet, demands on a finite bandwidth are 
rapidly straining the system. Nearly half a billion new devices started 
chattering over mobile broadband last year alone, pushing mobile traffic 
to 25 times what it was just 5 years ago. And wearables are leading to new 
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security concerns, from the use of highly personal data to track people's 
activity to maliciously attacking their online presence. 

“Tt’s a cliché that whenever there's a new technology we start talking 
about Huxley and A Brave New World, but with wearables — and what's 
loosely termed the Internet of Things — we truly are entering into a new 
era, and we have to start thinking of these issues,’ says Anupam Joshi, 
head of the Center for Cybersecurity at the University of Maryland, 
Baltimore County. 


TRAFFIC JAM 

Bytheendof2014, global mobile-data traffic reached 2.5 exabytes (2.5 billion 
gigabytes) per month according to the networking-technology company 
Cisco Systems. Of that, the world’s 100 million or so wearable devices 
were generating 15 million gigabytes of monthly traffic on what is a physi- 
cally finite portion of the electromagnetic spectrum, with their number 
expected to increase fivefold by 2019 (see “The catch with gadgets’). On 
top of the surge in those devices, there will be even greater chances for 
gridlock, as more people start wearing headsets that deliver data-hungry 
virtual and augmented reality experiences, says Robert Heath, a professor 
in electrical engineering at the University of Texas at Austin. 

All these devices clog up the airwaves, impairing performance and 
threatening essential internet traffic. To help ease congestion in the 
United States, the government pledged in 2010 to free up an extra 
500 megaherz (MHz) within ten years, a doubling of the bandwidth 
available for mobile devices at the time. But even this is unlikely to 
be enough, according to a more recent report prepared for CTIA-The 
Wireless Association, a communications industry group based in Wash- 
ington DC. It estimates that 350 MHz will need to be added from 2015 
onwards to keep up with US demand by the end of 2019, 150 MHz more 
than the government estimate for that period. And limited bandwidth 
is a global problem, with each country dealing with it in its own way. In 
India, where users have access to just one-tenth of the bandwidth avail- 
able to people in the United States, there are calls for spectrum sharing 
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and the freeing up of channels currently devoted to the military. In the 
United Kingdom, the government has approved the use of old analogue 
TV bandwidths; the first networks of smart devices using these frequen- 
cies could be rolled out by the end of the year. 

For their part, telecom companies need to make more efficient use 
of the spectrum. One way is to look beyond the crowded parts of the 
airwaves in the radio and television bands. Data from all the wearables 
on one person could flow through a body-area network designed to use 
acompletely different part of the spectrum, such as the millimetre wave- 
lengths. Then just one device would use the more congested bands to 
communicate all the data to the Internet. This creates its own problems, 
however, because shorter wavelengths demand more power and can be 
blocked by people's bodies. So researchers such as Heath are trying to 
get around those difficulties by, for example, optimizing antennas to 
reduce interference and power consumption. Improvements in steerable 
communication beams could also lead to better ways of transmitting 
millimetre-wavelength signals. 

Also promising is the idea of taking wireless communications into 
the visible-light realm using light-emitting dioides (LEDs) — which 
produce light and can act as photoreceptors — to communicate either 
between wearables or to talk directly to the Internet. Wearables that 
incorporate LEDs could use visible light to wrap a person in a body- 
area network. That would sense every movement and communicate the 
information to the light fittings in a room, which would be connected to 
the Internet through their power wiring. Although this technology relies 
on visible wavelengths, the signals are imperceptible. “LEDs blink so fast 
that the human eye cannot tell” says Daniele Puccinelli, an electrical 
engineer at the University of Applied Sciences of Southern Switzerland 
in Manno, who studies visible-light communications. 

Harald Haas, who researches mobile communications at the Univer- 
sity of Edinburgh, UK, plans to test a visible-light system in hospitals 
within the next year. Patients will wear wristbands that monitor their 
temperature and relay the data using LEDs that communicate with the 
hospital's lighting. 

A broader approach might have wearable devices from many people 
relaying information to each other rather than having each connect to 
the Internet. This concept underpins the multitiered networks promised 
by the much-vaunted fifth-generation (5G) mobile-communication 
systems that are predicted to be up and running in many parts of the 
world by 2020. In situations where crowds of people are trying to access 
the same content — travel information after a sports match, for instance 
— one device could act as a ‘seed; distributing the data to others in this 
network, which would reduce the number of times the data need to be 
downloaded from the Internet. 

One of the most attractive approaches makes devices smarter about 
when and how they use communication channels. These ‘cognitive radios’ 
sniff out underused regions of bandwidth and opportunistically hop into 
those gaps, speeding up communications. To reach their optimum poten- 
tial, bandwidths would need to be more open, so that devices could jump 
onto a licensed frequency to communicate, and then drop off the spec- 
trum when someone with higher priority enters. Although techniques 
based on this principle have been used for decades, cognitive radio will 
take it to a new level of efficiency, with devices smart enough to negotiate 
with each other to divvy up the available spectrum. 

Cognitive radios have great potential, but their development in the 
wearables realm is being held back by a lack of accepted standards and 
protocols for how this frequency hopping might work in practice, says 
Ekram Hossain, an electrical engineer at the University of Manitoba, Can- 
ada. “Until there is a standard, there won't be products,’ says Hossain, who 
adds that the research needed to establish these standards is under way. 


KEEPING SAFE 

When 176,000 people swarmed through the Consumer Electronics 
Show in Las Vegas in January, some of the hottest items were the crop of 
new wearable devices, ranging from watches and glasses to the Pacifi-i, 
a smart pacifier, or baby soother, that monitors an infant’s temperature 
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THE CATCH WITH GADGETS 


Data concerns could thwart the vast expansion of wearable electronics. 
The number of devices is rising quickly, putting strain on the already 
clogged mobile network. There are also worries about security and privacy. 
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and transmits the data to a parent’s phone. And if those parents were 
stressed out, they could try the Melomind headset, which is advertised 
to measure the brain’s electrical activity, beam it to a phone and then 
select the most appropriate music to help the wearer relax. 

Despite all the hype about wearables, there is also considerable scepti- 
cism about the gadgets available today. “Lots of people view wearables 
as just toys’, says Puccinelli. 

But signs point to them being much more useful in the near future, 
particularly in the medical arena. Wearables are increasingly measur- 
ing aspects of human physiology, providing electrical stimulation to 
the brain and even injecting medication. These applications come with 
potential risks for users. 

A key hurdle for the wearable revolution arises from the wealth of 
personal data they gather about their users. Surveys show that users 
worry about how these devices invade 
their privacy, as they upload intimate data 
to potentially vulnerable servers owned by 
companies that could change their terms of 
service, be bought out or go out of business. 

When the Pew Research Center, an 
independent fact-gathering organiza- 
tion in Washington DC, canvassed 1,600 
experts in 2014 about the future of the 
Internet, many expressed similar worries. 
“The realities of this data-drenched world 
raise substantial concerns about privacy 
and people's abilities to control their own 
lives,” according to the report. Those con- 
cerns have been compounded by some 
high-profile incidents, such as when 
users of Fitbit activity trackers allowed 
their activity logs to be publicly accessi- 
ble, unwittingly revealing when they had 
sex. When that was realized in 2011, Fitbit 
quickly took action to fix the problem. 

In another high-profile incident, the 
introduction of Google Glass headsets 
two years ago triggered concerns that 
users would capture images of passers-by without their knowledge. 
Researchers at the Center for Cybersecurity took this opportunity 
to apply their work on computer codes that enforce privacy policies. 
They built the wryly named FaceBlock app, which blocks out the faces 
of people who have requested privacy from photographs taken by 
Google Glass. But for this to work, a Google Glass owner would have 
to opt in by installing the app. So the only way for such a system to reli- 
ably provide privacy would be for manufacturers to make it standard 
and implement it with dedicated hardware, says Joshi. “Let’s say that 
Google was to build in a feature like this into every Google Glass so 
that it would automatically obey these kinds of commands — then it 
would work? 

Security concerns go hand in hand with privacy. Although encryption 
is becoming more pervasive and advanced, it is sometimes not used in 
low-cost wearable devices. Last year, researchers at the California-based 
information-management company Symantec, revealed that the loca- 
tion of many health monitors, including some from market leaders, can 
be easily tracked. And some of them wirelessly communicate passwords 
in clear text, which makes them vulnerable to hacking. Even if a health 
monitor is encrypted, the smartphone or hub device that links it to the 
Internet could also be a weak point, either because of unnecessarily 
broad permissions or because of malware. 

“If youre not encrypting the data you're definitely not secure,” 
says Bogdan Carbunar, a security researcher at Florida International 
University in Miami. “Even if you're encrypting the data you can still 
not be secure.” Carbunar worked with a team, including a researcher 
from IBM, on security holes in two popular low-cost wearable fitness 
devices, the Fitbit Ultra and the Garmin Forerunner. They found that 
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by impersonating the devices’ trusted webservers, they could fool the 
gadgets into uploading false data — even nonsensical numbers such as 
millions of steps in one day (see M. Rahman et al. IEEE Trans. Mobile 
Comput. http://doi.org/636; 2015). 

The researchers also found that they could inject data onto a tracker 
of their own, which would compromise data accuracy, something that 
could become a problem if fitness data are tied to health-insurance 
premiums, as they have been in some companies. Fitbit told Nature 
that it had been aware of the problem, which has been addressed 
in subsequent products. Garmin did not respond to requests for 
comment. 

According to Carbunar, security adds costs for manufacturers 
in terms of money, development time, device size and power con- 
sumption. But researchers are pushing to minimize those costs. After 
working out how to hack the devices, Car- 
bunar and his team devised a way to keep 
them safe. They developed SensCrypt, 
an encryption protocol designed specifi- 
cally for low-energy fitness trackers that 
reduces communications costs. It uses a 
procedure called symmetric key encryp- 
tion to protect against remote attacks 
and to provide some security even if the 
device is stolen and tampered with. The 
researchers were unable to implement it 
on Fitbit or Garmin devices because they 
use closed-source code, but have tested 
their system on an open-source proxy. 

Even with high levels of encryption, 
devices could still be vulnerable to attack, 
says Bart Preneel, a cryptographer at the 
KU Leuven and iMinds research centre 
in Belgium. Preneel specializes in under- 
standing and preventing side-channel 
attacks: attempts by hackers to infiltrate 
mobile devices by detecting fluctuations 
in the power usage and using these to cal- 
culate encryption keys and other secure 
information. “These attacks can be made at a distance of 10 or 
20 metres,’ he says. This type of attack was discovered around 20 years 
ago in relation to bank cards, but ways of preventing it are not imple- 
mented in many wearable devices, particularly implanted medical 
technology. 

Some companies have tried to improve security on mobile devices 
and wearables by equipping them with biometric devices such as 
fingerprint readers and iris scanners. But even these are insecure: 
researchers and hackers have shown how high-resolution cameras can 
capture someone's iris from a distance and how to steal a fingerprint 
using a phone's camera. 

But Preneel says that biometrics are promising for encryption if 
designers focus on measures that are not so easy to discover. There are 
already wearables that authenticate users on the basis of their heartbeat 
pattern. In the long run, Preneel envisages using internal signals from 
the body, such as DNA or the internal microbial community, to pair 
with wearable gadgets so that the devices would unlock only when in 
close proximity to the owner. 

With these kinds ofimproved security — and many upgrades in com- 
munications networks — a lost tourist in the future would stand a better 
chance of getting their wearables to work in a crowded plaza. Tom would 
easily be able to summon a map of the city on his lenses and would know 
his personal data were safely encrypted. Following the highlighted route, 
he might even make it to the station with enough time to get a coffee 
and charge his gadgets. It may not be the technological utopia imagined 
by some wearables enthusiasts, but at least he will catch his maglev. m 


Kat Austen is a freelance writer in Berlin. 
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Institutions must do their 
part for reproducibility 


Tie funding to verified good institutional practice, and robust science will shoot up 
the agenda, say C. Glenn Begley, Alastair M. Buchan and Ulrich Dirnagl. 


mous burden: it delays treatments, 
wastes patients’ and scientists’ time, and 
squanders billions of research dollars. It is 
also widespread. An unpublished 2015 sur- 
vey by the American Society for Cell Biology 
found that more than two-thirds of respond- 
ents had on at least one occasion been unable 
to reproduce published results. Biomedical 
researchers from drug companies have 
reported that one-quarter or fewer of high- 
profile papers are reproducible’. 
Many parties are addressing the problem. 
Funding bodies such as the US National 
Institutes of Health (NIH) have announced 


[ressbse research poses an enor- 


training initiatives’ and explicitly instructed 
grant reviewers to consider whether experi- 


mental plans ensure rigour. New methods of 


data analysis and peer review have been pro- 
posed to deflate bias. 

Several journals, including Nature and 
Science, have updated their guidelines and 
introduced checklists. These ask scientists 
whether they followed practices such as rand- 
omizing, blinding and calculating appropriate 
sample size. Science has also added statisti- 
cians to its panel of reviewing editors. Phil- 
anthropic and non-profit organizations have 
sponsored projects to improve robustness. 

Funders’ policies, journal guidelines and 
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widespread soul-searching are necessary. But 
they are not sufficient. 

Conspicuous by their absence from these 
efforts are the places in which science is done: 
universities, hospitals, government-supported 
labs and independent research institutes. This 
has to change. Institutions must support and 
reward researchers who do solid — not just 
flashy — science and hold to account those 
whose methods are questionable. 


SPOT THE SHIRKERS 

Although researchers want to produce work 
of long-term value, multiple pressures and 
prejudices discourage good scientific > 
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PRESSURED FINDINGS 


A survey of US biomedical trainees suggests that the push to publish spurs unreliable results. 


Felt pressure to publish 
uncertain findings 


Felt pressure to support a 
mentor’s hypothesis even 
when data did not support it 


Knew of mentors who required lab 
members to have a high-impact 
publication before moving on 


20 30 40 50 
Trainees reporting* (%) 


*Online survey of ~140 trainees at the MD Anderson Cancer Center in Houston, Texas. 


> practices. In many laboratories, the 
incentives to be first can be stronger than 
the incentives to be right. 

Discussions of conflicts of interest typically 
centre on relationships with industry, but aca- 
demic scientists face more pernicious, even 
existential temptations. Monetary rewards are 
often less important than the ‘currency’ with 
which scientists advance their careers: high- 
level publications lead to funding opportu- 
nities, promotions, awards and other forms 
of recognition. These markers of scientific 
achievement become proxies for assessment 
of the work itself, and further encourage spec- 
tacular, but less than substantiated, research. 

Amplifying these pressures is a human 
prejudice in favour of our own ideas. There is 
a very real temptation to ignore a result that 
does not conform to our preconceptions, or 
to recast it so that it does. Data-dredging is 
used to find statistically significant results that 
justify a publication. Sound practices such as 
blinding, multiple repeats, validated reagents 
and appropriate controls’ are dismissed as 
luxuries or nuisances. 

Research institutions contribute to and 
benefit from these perverse incentives. They 
bathe in the reflected glory of their faculty; 
they trumpet breakthroughs published in 
top-tier journals, lauding achievements 
to the media and donors. Some even pay 
investigators for publications. Many require 
that investigators generate their salary from 
research grants. 

An anonymous survey of around 140 train- 
ees at the MD Anderson Cancer Center in 
Houston, Texas, found that nearly one-third 
had felt pressure to prove a mentor’s hypoth- 
esis even though their experimental results 
did not support it, and nearly one-fifth had 
themselves published results they considered 
less than robust”. Nearly half knew of mentors 
who required lab members to publish a high- 
impact paper to complete training in their 
labs (see ‘Pressured findings’). 

Although important, the checklists intro- 
duced by journals do nothing to shift the 
focus from results to the legitimacy of the 
process by which the results are produced. 
Researchers encounter these lists after they 


have drawn conclusions and are ready to 
announce them — not when planning their 
research. There is no mechanism to verify that 
listed practices were actually employed. 

The core instinct of scientists — scepticism 
— is punished by the current system. Insti- 
tutions have a duty to reform it. They must 
shoulder their responsibility for training 
graduate students and postdoctoral fellows, 
for supporting the scientific behaviour of 
their faculty members and for the knowledge 
that emanates from their endeavours. 


GOOD INSTITUTIONAL PRACTICE 

Although there are some protections against 
outright fraud, few institutions have strong, 
transparent processes in place to discourage 
poor-quality science or to foster objectiv- 
ity. We propose that research institutions 
that receive public funding should apply 
the same kind of oversight and support 
to ensure research integrity as is routinely 
applied for animal husbandry, biosafety and 
clinical work. 

To conduct animal research, investigators 
must hold licences and undergo continuous 
education. Institutions appoint delegates to 
monitor compliance, and those delegates are 
held to account by regulators. Similar over- 
sight is used for work with radioactivity and 
human embryonic 


stem cells. “Most 
These functions institutions 
could be broadened will not make 
He i. a rie the necessary 
research conduct, moveunless 
forced.” 


such as the ARRIVE 
(Animal Research: 
Reporting of In Vivo Experiments) and 
MIAME (Minimum Information About a 
Microarray Experiment) guidelines, and 
data sharing as required by the NIH and the 
National Science Foundation. 

Standards already exist that define good 
laboratory practice to test chemicals for 
toxicity, good manufacturing practice and 
good clinical practice. These systems were 
introduced to ensure a degree of consist- 
ency, quality and integrity. Procedures 
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are in place to ensure compliance. 

The scientific community should come 
up with a similar system for research, which 
we term good institutional practice (GIP). If 
funding depended ona certified record of 
compliance with GIP, robust research would 
get due recognition. 

Ata minimum, GIP should consist of the 
following tenets. 


Routine discussion of research methods. 
Many labs already comb through data and 
methods as a group before submitting a 
paper. Such discussions should be broad- 
ened and formalized across an institution. 
Regular department and cross-department 
meetings should be established to dissect 
manuscripts in preparation. Methods and 
processes (rather than conclusions) would be 
debated just as a competitor’s paper might be 
critiqued in a journal club. Primary research 
material would be available. This practice 
is roughly analogous to the ‘Morbidity and 
Mortality’ conferences routine in hospitals, 
in which working hours are also intense. 

Regular critique sessions help scientists to 
learn to defend their science without feeling 
defensive. Investigators publicly hold each 
other to account, and trainees learn what 
to demand of their own research. Anxieties 
can be raised informally, highlighting institu- 
tional weaknesses and systematic errors. The 
practice also puts a short-term focus on what 
has traditionally been a long-term reward: a 
reputation for careful science. 


Reporting systems. Also well-established in 
clinical medicine is a system to anonymously 
flag occurrences that did or could have jeop- 
ardized a patient's care. Such systems are often 
the only way workers dare to raise concerns 
and admit mistakes. Similarly colleagues, 
graduate students and postdocs should be 
able to discuss concerns about sloppy science 
without jeopardizing their careers. Desig- 
nated co-mentors, a departmental omsbuds- 
man or existing university offices of research 
integrity could be charged with providing a 
forum for informal, confidential discussions. 
Any formal reports should be investigated in 
a balanced and impartial way. 


Training and standards. Some sloppiness 
stems from ignorance. Many investigators 
determine whether trainees are ready to move 
on by gauging the number and impact fac- 
tors of their publications; instead, supervisors 
should base such decisions on whether their 
lab members understand research methods 
and process. Compulsory institutional train- 
ing should ensure a common understanding 
of rigorous experimental design, research 
standards and objective evaluation of data. 
Faculty members and trainees should dem- 
onstrate their ability to spot problems such 
as ‘cherry picking’ data to make the best 
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story. Compliance with research standards, 
including data-sharing, should be supported, 
audited and acknowledged. 


Records and quality management. Labora- 
tory notebooks and records must be avail- 
able for independent review. Electronic 
laboratory notebooks facilitate collabora- 
tion, supervision and record keeping, and 
can link records to the original data. One 
of our institutions (U.D’s) is now adopting 
these system-wide. Random audits should 
be conducted to guarantee that experimen- 
tal data are duly recorded and that elements 
of good research practice are routine. Such 
spot-checks are commonplace in industry. 


Appropriate incentive and evaluation 
systems. Institutions should find ways to 
deter non-compliance with guidelines, poor 
mentoring and scientific sloppiness. Faculty 
members with poor records should face loss 
of laboratory space and trainees, decreased 
funding and potential demotion. Conversely, 
faculty members who excel as mentors and 
careful experimentalists should be rewarded. 
Appropriate metrics should be developed so 
that promotions are based on robustness and 
high-quality mentoring, rather than simply 
on high-profile publications’. Surveys such 
as that conducted at MD Anderson exem- 
plify one way in which administrators can 
gain the insight necessary to improve the 
research environment. Institution-level 
metrics could help to monitor overall per- 
formance and remind all researchers and 
administrators of their responsibility to the 
scientific community. 


Enforcement. Institutions should investigate 
egregious lapses and record them in a routine, 
transparent way. Departments of research 
integrity or other centres of excellence should 
be funded, staffed and given enough author- 
ity to prevent, detect, investigate and penal- 
ize poor-quality research. They should also 
be charged with promoting an institutional 
culture that nurtures robustness. 


GETTING TO GIP 

The systems needed to promote reproduc- 
ible research must come from institutions — 
scientists, funders and journals cannot build 
them on their own. These kinds of changes 
will require additional money, infrastructure, 
personnel and paperwork. The load on insti- 
tutions and investigators will be real, but so is 


the burden of irreproducible research. Even if 


it is accompanied by an apparent decrease in 
productivity, the resulting increase in research 
quality will be well worth the costs. 

Still, most institutions will not make the 
necessary moves unless forced. Funding 
bodies should make GIP a prerequisite for 
receiving a grant. The concept has gained 
some traction: last year, Science Foundation 
Ireland announced plans to conduct external 
audits on some of the labs that it supports. 

There will not be one ideal solution. Faculty 
members, trainees and administrators will 
need to come together for honest, difficult 
discussions to restructure institutions. Nei- 
ther scientists nor institutions should engage 
in mere box checking; new practices must 
restrain sloppiness while interfering only 
minimally with the many scientists who are 
behaving well. 


Large-scale change is possible. In the 
1970s, clinical research had little rigour 
or oversight. Now clinical trials routinely 
include concurrent control groups, double- 
blinding, pre-specified endpoints, power 
calculations to determine patient numbers 
and analysis plans that thwart bias. In addi- 
tion, primary data are available for inde- 
pendent statistical analysis by regulatory 
authorities. At the time, these changes were 
controversial; many physicians believed 
them to be unnecessary and regressive. 

Nothing an institution can do will prevent 
misconduct altogether. This is not the goal. 
Rather, it is to support the work of well- 
meaning scientists, to reduce the waste from 
biased results, and to relieve some of the pres- 
sures that encourage sloppy science. m 
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Push renewables to spur 
carbon pricing 


Make wind and solar power even cheaper by opening up access to the electricity grid 
and ending fossil-fuel subsidies, urge Gernot Wagner and colleagues. 


utting a price on carbon dioxide and 
Pe« greenhouse gases to curb emis- 

sions must be the centrepiece of any 
comprehensive climate-change policy. We 
know it works: pricing carbon creates broad 
incentives to cut emissions. Yet the current 
price of carbon remains much too low rela- 
tive to the hidden environmental, health 
and societal costs of burning a tonne of coal 
or a barrel of oil’. The global average price 
is below zero, once half a trillion dollars of 
fossil-fuel subsidies are factored in. 


Momentum towards effective carbon pric- 
ing is building. California, joined by the Cana- 
dian province of Quebec, leads by pricing 85% 
of such emissions at around US$12 per tonne. 
Sweden applies the highest value globally on 
half of its carbon dioxide emissions, at up to 
$125 per tonne. The European Union has the 
largest system in terms of tonnes covered, 
pricing 45% of its greenhouse-gas emissions 
at about $8 per tonne. China is experiment- 
ing with regional cap-and-trade systems. And 
the US Clean Power Plan encourages states 
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to meet emissions-reduction targets through 
market-based mechanisms. Yet global emis- 
sions continue to climb. 

The current inadequacy of carbon pric- 
ing stems from a catch-22. Policymakers are 
more likely to price carbon appropriately if 
it is cheaper to move onto a low-carbon path. 
But reducing the cost of renewable energies 
requires investment, and thus a carbon price. 

In our view, the best hope of ending 
this logjam rests with tuning policies to 
drive down the cost of renewable power > 
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China (left) has invested in renewables supply, whereas Germany (right) has subsidized demand. 


sources even further and faster than in 
the past five years. The cost of crystalline 
silicon photovoltaic (PV) modules has fallen 
by 99% since 1978 and by 80% since 2008; 
installation costs for wind power have also 
dropped, and solar and wind capacity has 
grown’ (see “The rise of renewables’). Prices 
will continue to fall, but — without more 
help — the decrease will not be fast enough 
to make a dent in the climate problem. 

Some obstacles are technological; many 
are policy-driven**. Most energy regula- 
tions were set with the fossil-fuel industry in 
mind, and energy providers fight to preserve 
their existing assets rather than adapt to new 
conditions. More strategic coordination of 
energy resources, grid operation and climate 
policy is needed, keeping in mind trade-offs. 


GRID OF POWER 
We call for policymakers to modernize and 
open up access to power grids, and to subsi- 
dize key technologies — particularly for stor- 
age. Renewables must have the same access 
to the grid as fossil sources; and grids must 
accept and manage distributed generation 
and intermittent flows. Other investments 
must include support for research, develop- 
ment and demonstration of energy storage 
and new low-carbon energy technologies. 
Barriers to international trade in renewable 
technologies and services must be lowered. 
We are in the middle of a low-carbon- 


energy revolution. Germany has proved an 
early driving force on the demand side and 
China has been strong on the supply side. 
Germany's Renewable Energy Sources Act, 
adopted in 2000, guaranteed 20 years of 
grid access and fixed prices for its solar- and 
wind-power producers. German electric- 
ity consumers are subsidizing the expen- 
sive early stages of 

the development, 

deployment and inte- 

gration of renewables 

to the tune of more 

than $20 billion a 

year. In 2014, despite 

the country export- 

ing more electricity 

than ever to its neigh- 

bours and phasing out 

nuclear power, carbon emissions from the 
German power sector were the second low- 
est since 1990. 

Meanwhile, China's climate, energy and 
industrial policies have boosted the manu- 
facturing scale of renewable technologies, 
expanding solar PV production more than 
100-fold since 2005 (ref. 5). As a result, PV- 
module prices have come down faster than 
anticipated. Other countries are taking note. 
More than half of US states are mandating 
an increase in the proportion of renewable 
power and have an incentive to expand such 
programmes under the Clean Power Plan. 
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But ‘business as usual’ is not enough. Even 
Germany — where solar energy meets more 
than 50% of national electricity demand on 
a sunny Sunday afternoon (when the sun is 
out and demand is low) — gets more than 
half of its annual electricity from coal and 
natural gas. Further reduction of fossil 
fuels relative to renewables is not assured. 
Fossil-fuel prices are volatile, and demand 
for renewables stalls when coal and natural 
gas are cheap. 

Poorly designed subsidies can be counter- 
productive because some low-carbon 
technologies perform better than others. 
Some forms of bioenergy may increase 
rather than reduce net emissions, owing to 
energy-intensive, fossil-fuel-based produc- 
tion processes and land-use changes, such as 
deforestation®. The reservoirs of hydropower 
dams may leak methane, and nuclear plants 
are expensive and carry large potential envi- 
ronmental risks. Still, the worst offenders are 
subsidies for conventional fossil fuels. 


OPEN EXPERIMENT 

The ideal solution is to vary the price of 
electricity by time and location, reflecting 
the full costs of generation and distribu- 
tion — including environmental costs. 
But that leads to another dilemma: proper 
pricing at all levels is politically and analyti- 
cally difficult. Compromises and alterna- 
tive instruments are needed. For example, 


German feed-in tariffs that guarantee fixed 
prices for renewable-energy generation led 
to large increases in solar and wind installa- 
tions. However, as the prevalence of renew- 
able energies increases, the system needs 
— and is undergoing — reform. With no 
single ‘best’ solution available, controlled 
policy experiments are needed. 

First, policymakers must check that 
interventions pass the benefit-cost test. 
Given how far the world remains from a sen- 
sible global climate policy, this is often a low 
threshold. Many direct subsidies that sup- 
port renewables — especially solar energy 
—are beneficial, not least because they spur 
learning-by-doing’. 

Second, any renewables policy should 
make a national — and eventually global — 
carbon cap or tax more likely. If an interven- 
tion might derail such efforts, then stop. If 
it paves the way for stronger climate policy, 
try it. The Clean Power Plan, for example, 
encourages flexible, market-based ways of 
achieving emissions-reduction goals and cre- 
ates a framework for trading between states — 
a clear boon to sensible carbon pricing. 

Third, governments should break up 
non-competitive arrangements around grid 
access. Funding and regulation should sup- 
port the modernization of power grids to 
allow new renewable energy sources to be 
integrated. So that everyone pays their fair 
share towards the upkeep of the infrastruc- 
ture, grid users should be charged — but 


THE RISE OF RENEWABLES 


Global wind- and solar-power capacities have grown by 40-50 gigwatts each year since 2008, with 
consumption also rising (1). Meanwhile, prices of photovoltaic (PV) panels and solar energy have fallen 
steeply since 2010 (2), in part driven by climate and energy policies and more-efficient manufacturing. 


1 Consumption and capacity increasing 
HM Wind © Solar 
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with caution. High rates might provoke 
some consumers to disconnect, increasing 
costs for the rest. Boosting supply of renew- 
ables during peak times might in turn lower 
peak pricing and thus (perversely) decrease 
overall incentives for renewables adoption’. 

Fourth, the energy sector should be 
viewed in its entirety. For instance, increased 
electric-vehicle use could spread electricity 
demand more evenly throughout the day, 
flattening traditional peaks. It would also 
help to lower the prices of battery technolo- 
gies, hastening systemic change in the trans- 
port and electricity sectors. 

Ambitious renewables policies should be 
followed by strengthened climate policies. 
For example, rapid renewables deployment 
has reduced Germany’s carbon emissions 
but has not brought down the EU total, 
because German emissions are capped 
under the EU’s Emissions Trading System. 
The decrease in Germany, all else being 
equal, is compensated by emissions increases 
elsewhere under the cap. All else must not be 
equal. The cap ought to be tightened. 

These are the sorts of pieces that need to 
come together to deepen solar and wind pen- 
etration levels and achieve the ‘holy grail’ of 
climate policy: an effective carbon price. = 


Gernot Wagner is lead senior economist at 
the Environmental Defense Fund in Boston, 
Massachusetts, USA, and adjunct associate 
professor of international and public affairs 
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COMPUTER SCIENCE 


Enchantress of 
abstraction 


Richard Holmes re-examines the legacy of Ada 
Lovelace, mathematician and computer pioneer. 
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Ada Lovelace, painted in 1835 by Margaret Sarah Carpenter. 
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he bicentenary of Augusta Ada King, 
[ain of Lovelace, heralds the 

critical reassessment of a remarkable 
figure in the history of Victorian science. Ada 
Lovelace (as she is now known) was 27 years 
old and married with 3 children when she 
published the first account of a prototype 
computer and its possible applications in 
1843. Her 20,000-word paper was appended. 
as seven Notes to a translation of a descrip- 
tive article, Sketch of the Analytical Engine 
Invented by Charles Babbage, Esq. 

Lovelace’s account was the fruit of one 
of the most intriguing collaborations in 
the annals of science: her friendship with 
Charles Babbage, Lucasian Professor of 
Mathematics at the University of Cambridge, 
UK, and inventor of the landmark analytical 
engine. The Notes eventually brought Love- 
lace both acclaim and notoriety. Babbage 
himself described her unforgettably to the 
physicist Michael Faraday as “that Enchant- 
ress who has thrown her magical spell 
around the most abstract of Sciences and has 
grasped it with a force that few masculine 
intellects (in our own country at least) could 
have exerted over it”. 

The exact nature of that force and 
enchantment continues to puzzle histori- 
ans of science, not least because Lovelace’s 
correspondence, largely archived at the 
Bodleian Library in Oxford, has not been 
fully published (see selections by Dorothy 
Stein in Ada (MIT Press, 1985) and Betty 
A. Toole in Ada, Enchantress of Numbers; 
Strawberry, 1992). What has emerged is the 
hitherto unsuspected range of Lovelace’s 
interests and contacts, which linked the 
worlds of Victorian science and literature. 

Lovelace was the only legitimate child 
of the poet Lord Byron. She never met 
her father, self-exiled in Italy and Greece, 
but inherited much of his rebellious spirit 
and something of his unstable genius. She 
directed it towards science, declaring: “I do 
not believe that my father was (or ever could 
have been) such a Poet as J shall be an Ana- 
lyst (& Metaphysician); for with me the two 
go together indissolubly”. 

She was brought up with pathological 
severity by her mother, the brilliant Lady 
Annabella Byron — dubbed “the Princess of 
Parallelograms” for her own fascination with 
mathematics — and a squadron of female 
advisers whom Lovelace christened the 
Furies. Forbidden to read her father’s poetry, 
young Ada was encouraged to study math- 
ematics, astronomy and music, and allowed 
to design flying machines, play the harp and 
commune with her cat, Puff. In her early 
twenties she began to study the new calculus 
under Augustus De Morgan, a proponent of 
Boolean algebra, who described her as poten- 
tially more promising than any ‘senior wran- 
gler’ or first-class Cambridge maths student. 

In spring 1834, Lovelace met her first great 
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mentor: Mary Somerville, the translator of 
French astronomer Pierre-Simon Laplace 
and author of the groundbreaking popular 
study On The Connexion of the Physical Sci- 
ences (1834). Somerville demonstrated that 
women could make their mark in science 
(R. Holmes Nature 514, 432-433; 2014). It 
was she who introduced Lovelace to Bab- 
bage at one of his champagne-and-science 
receptions in Marylebone, London, where 
Charles Darwin, astronomer John Herschel 
and geologist Charles Lyell were frequent 
guests. At these soirées, Babbage displayed 
a model of his early difference engine — a 
brass calculating machine capable of tabu- 
lating higher-order polynomial functions 
— alongside a silver automaton in the form 
of a dancing ballerina. Most guests were 
drawn to the ballerina; Lovelace, Babbage 
noticed, was entranced by the gleaming cogs 
of the calculating machine. Thus the unlikely 
friendship began. 

When Ada married William King, later 
Earl of Lovelace, in 1835, her London town 
house brought her even closer to Babbage. 
Their mathematical correspondence, both 
serious and teasing, focused on the analytical 
engine and the possibilities of mathematical 
and symbolic calculation. Thus in 1840 Love- 
lace was discussing the elimination game 
solitaire, in which 26 marbles must ‘jump 
each other, in an apparently unpredictable 
sequence, until only one remains. She chal- 
lenged Babbage to consider whether there 
could be “a mathematical formula... on 
which the solution depends, and which can 
be put into symbolical language”. She added, 
“Am I too imaginative for you? I think not” 

By 1841 Lovelace was developing a con- 
cept of “Poetical Science’, in which scientific 
logic would be driven by imagination, “the 
Discovering faculty, pre-eminently. It is that 
which penetrates into the unseen worlds 
around us, the worlds of Science.’ She saw 
mathematics metaphysically, as “the language 
of the unseen relations between things”; but 
added that to apply it, “we must be able to 
fully appreciate, to feel, to seize, the unseen, 
the unconscious”. She also saw that Babbage’s 
mathematics needed more imaginative pres- 
entation. So when a scientific paper on the 
analytical engine was published by Italian 
engineer Luigi Menabrea, Lovelace (per- 
haps inspired by Somerville’s translation 
of Laplace) translated it from the original 
French. A delighted Babbage encouraged her 
to adda commentary. When published in the 
British journal Scientific Memoirs (volume 3, 
October 1843), Lovelace’s ‘translator’s Notes’ 
had expanded to twice the length of Mena- 

brea’s paper, and were 
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The Invention of Nature: Alexander von Humboldt’s New World 
Andrea Wulf KNOPF (2015) 

Alexander von Humboldt (1769-1859) electrified fellow polymaths 
such as Johann Wolfgang von Goethe, discovered climate zones and 
grasped the impact of industrialization on nature. In her coruscating 
account, historian Andrea Wulf reveals an indefatigable adept of close 
observation with a gift for the long view, as happy running a series 

of 4,000 experiments on the galvanic response as he was exploring 
brutal terrain in Latin America. Most presciently, and at a time of 
fragmenting disciplines, he saw life as a “net-like intricate fabric” and 
brilliantly synthesized the sciences in his grand treatise Cosmos. 


The Only Woman in the Room: Why Science Is Still a Boys’ Club 
Eileen Pollack BEACON (2015) 

In the 1970s, Eileen Pollack was one of the first women to earn 

a bachelor’s degree in physics at Yale University in New Haven, 
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fe, Gia | Connecticut. Isolated and unencouraged, she abandoned dreams of 
In the an | a life in cosmology and turned to writing. In this investigative memoir, 
| Roo oN Pollack uses her own experience and interviews with students and 
| Ti ence am academics as a lens on gender and science. Many will wince in 
eT = sympathy over the biases and sexism that made Pollack’s academic 
: Y career a salmon run to nowhere, yet despite ongoing inequalities in 
physics, she senses hopeful shifts in awareness. 
—————— Weatherland: Writers and Artists Under English Skies 
r ee Alexandra Harris THAMES & HUDSON (2015) 


<= This is a gorgeous piece of writing, sure to grip bibliophiles and the 
| meteorologically inclined alike. Scouring English art and literature for 
p Wesel, ot | references to weatherscapes, Alexandra Harris has magicked them 
ar. | into a subtle meditation on the nation’s changeable culture. Snippets 
| | of science intersperse discussions of Shakespeare’s tempestuous 


Ik wi dramas, the “gothic fogs” of Charles Dickens’s 1853 Bleak House, the 
= rain-soaked revelations of poet Ted Hughes and more. Harris captures 
~ / the evanescent interplay of mind and sky, just as climate change 


could muddy that relationship out of all recognition. 


= The Meaning of the Library: A Cultural History 
THE Editor Alice Crawford PRINCETON UNIVERSITY PRESS (2015) 
Mi Ed Ni The current pressures on libraries give a poignant edge to this 
| ates IN & chronicle, edited by research librarian Alice Crawford, which offers 
| j “da IRE | rarefied glimpses of the institution through time. Historian Andrew 
IRA A & Pettegree reveals that printing contributed to the Renaissance 
jac MMe Histon library’s decline; academic librarian Robert Darnton relates how 


eighteenth-century booksellers went through hell and high mountain 
passes to transport their wares; and English-literature professor 

/ Laura Marcus surveys libraries in films such as Alain Resnais’s 1956 
All the Memories of the World and Orson Welles’ 1941 Citizen Kane. 


Waste to Wealth: The Circular Economy Advantage 

Peter Lacy and Jakob Rutqvist PALGRAVE MACMILLAN (2015) 

In this crisply lucid primer on the innovative sustainable-business 
model called the circular economy, Peter Lacy and Jakob Rutqvist 
make a business case for repurposing wasted resources, life cycles 
and embedded values such as unrecovered energy. They sketch in 
the historical background; discuss worked examples of business 
models such as the circular supply chain; describe the creation of 
“circular advantage”; and map out strategies for making the shift to 
full sustainability. Barbara Kiser 
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P Lovelace is sometimes loosely 
described as the first computer program- 
mer. She did produce an elegant set of tables 
showing how the engine could calculate 
Bernoulli numbers, but based on equations 
supplied by Babbage. Lovelace’s original- 
ity lay in her conceptual definitions of the 
engine’s mathematical functions, and her 
brilliant speculations on its design possi- 
bilities, going far beyond anything Babbage 
himself articulated. She wrote: “We may say 
most aptly that the Analytical Engine weaves 
algebraic patterns just as the Jacquard-loom 
weaves flowers and leaves.” 

She distinguished it sharply from the 
difference engine and other “mere ‘calculat- 
ing machines’, writing prophetically that it 
“holds a position wholly its own... A new, a 
vast, and a powerful language is developed 
for the future”. Like the Jacquard loom, it 
used paper punchcards that could program 
variable settings into a mechanical proces- 
sor to be constructed from thousands of 
brass numerical cogs, vertically mounted 
in a system of calculating ‘barrels’, with 
‘loops and conditional branchings'’ built in. 
Despite having no defined power source, it 
was essentially the first genuine design for a 
working computer. 

Next, Lovelace pointed out its revolu- 
tionary potential to handle purely symbolic 
notations, which gave it the potential to win 
games or compose music: “Supposing, for 
instance, that the fundamental relations of 
pitched sounds in the science of harmony 
and of musical composition were suscepti- 
ble of such expression and adaptations, the 
Engine might compose elaborate and scien- 
tific pieces of music of any degree of com- 
plexity or extent.” 

Finally, she raised the question of whether 
the engine could think, but concluded that 
it “has no pretensions whatever to originate 
any thing”. This was 


to have huge reso- “Lovelace’s . 
nance. Inhis1950 OFS inality lay m 
paper ‘Computing ier definitions 
machinery and of the engine’s 
intelligence, math- functions, and 
ematician Alan her brilliant 
Turing listed nine speculations 
potential objec- onitsdesign 


tions to the pos- 
sibility of artificial 
intelligence (A. Turing Mind 59, 433-460; 
1950). The sixth was “Lady Lovelace’s Objec- 
tion” that a machine cannot do anything new; 
he initially agreed, then wondered about 
it. “A better variant of the objection says 
that a machine can never ‘take us by sur- 
prise’... Machines take me by surprise with 
great frequency.” 

So does Lovelace. As her correspond- 
ence is gradually published, the extent of 
her scientific interests is emerging: they 
included railways, experimental telegraphy, 
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A portion of Charles Babbage’s never-completed analytical engine, with printing mechanism. 


magnetism, animal intelligence, probability 
theory and photography. Along with this 
is the multitude of luminaries whose work 
she knew, from Faraday to inventor Charles 
Wheatstone, engineer Isambard Kingdom 
Brunel, social theorist Harriet Martineau 
and novelist Charles Dickens. 

Lovelace planned to draft many other 
scientific papers, for example on conscious- 
ness. These ambitions, latterly fuelled by 
opium, became increasingly visionary: “My 
own great scientific object ... is the study of 
the Nervous System, and its relations with 
the more occult influences of nature.’ She 
wished to become “a Newton for the Molecu- 
lar Universe” of the mind. Another paper was 
to be on the revolutionary field theories of 
Faraday, to whom she wrote boldly: “I mean 
(unless you discourage me) to undertake 
your Researches for review, or at any rate as 
my hinge and centre for an Electrical Article” 
By her thirties, aware of her own celebrity, 
Lovelace became increasingly provocative, 
speaking out on science, sex and life after 
death — “subjects few men and no women 
venture to touch upon’ as her father’s old 
friend, politician John Hobhouse, observed. 

Lovelace died in great pain at 36, from 
uterine cancer. Almost her last independ- 
ent act was visiting the Great Exhibition of 
1851 with Babbage, to revel in scientific and 
technological advances — although both 
knew that the analytical engine would never 
be built in their own lifetimes. Both poetry 
and science attended her deathbed: Dickens 
came to read from his 1848 Dombey and 
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Son; Lovelace wrote a sonnet on Newton's 
rainbow for her own tomb. 

Lovelace’s long-term influence has 
been as much cultural as scientific. She 
may have partly inspired Princess Ida in 
Alfred Tennyson’s poem about a univer- 
sity of women, The Princess (1847). The 
precocious nineteenth-century teenager 
Thomasina Coverly in Tom Stoppard’s 1993 
play Arcadia, who understands chaos theory 
before it is established, was based on her. She 
is central to at least two science histories — 
James Gleick’s The Information (Pantheon, 
2011) and Walter Isaacson’s The Innovators 
(Simon & Schuster, 2014) — as well as to 
Sydney Padua’s hilarious (but scholarly) 
graphic novel, The Thrilling Adventures of 
Lovelace and Babbage (Particular, 2015). 

Since 2009, an international Ada Love- 
lace Day has been celebrated every October, 
to promote women in science. The annual 
Lovelace Medal is awarded by the academy 
of the British Computer Society. A major 
academic conference will be dedicated to her 
work at the Mathematics Institute in Oxford 
this December (see go.nature.com/sbcojl). 
As Lovelace once wrote to a startled Faraday: 
“T would not miss a possible opportunity of 
being ... useful to Science (Science whose 
Bride I am)!” = 


Richard Holmes is the author of The Age 
of Wonder, which won the 2009 Royal 
Society Prize for Science Books, and Falling 
Upwards. 

e-mail: richard.holmes@osb.me.uk 
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Correspondence 


Monitor ecosystem 
services from space 


We suggest that Earth 
observation should be used to 
monitor ecosystem services in 
the run-up to implementation of 
the United Nations Sustainable 
Development Goals (SDGs; see 
also A. K. Skidmore et al. Nature 
523, 403-405; 2015). 

A reliable Earth-observation 
framework would provide 
long-term spatial indicators 
of ecosystem services. It could 
capture changes in environmental 
and socio-economic features, 
for example by comparison 
with more than 40 years of 
Landsat satellite data and with 
information from new sensors 
such as the Sentinel fleet. 

Earth observation could inform 
management decisions about how 
to resolve conflicting objectives 
that arise from the SDGs (see, for 
example, R. Bosch et al. Nature 
523, 526-527; 2015). It would 
help in evaluating remote effects 
(teleconnections) for ecosystem- 
service provision and usage, such 
as whether biofuel production 
in one place creates biodiversity 
loss, pollution or deforestation 
elsewhere (see J. Liu et al. Science 
http://doi.org/627; 2015). 

We need new forms of data 
integration and case-study 
synthesis. Earth-monitoring 
systems must be developed with 
input from environmental and 
social scientists to link up with 
existing knowledge, for example 
by relating ecosystem services to 
biodiversity. 

Anna F. Cord, Ralf Seppelt 
Helmholtz Centre for Environmental 
Research (UFZ), Leipzig, Germany. 
Woody Turner NASA, 
Washington DC, USA. 
ralf.seppelt@ufz.de 


Biodiversity on canal 
route already at risk 


The Nicaraguan government 

is reviewing an environmental 
and social impact study for a 
proposed 300-kilometre canal to 
connect the Pacific and Atlantic 


oceans. As members of the 
specialist team who contributed 
to the ‘baseline’ biodiversity 
assessment for the study, we are 
in a position to respond to critics 
of the proposal (see, for example, 
A. Meyer and J. A. Huete-Pérez 
Nature 506, 287-289; 2014). 

The internationally recognized 
environmental consulting 
firm ERM was commissioned 
to produce the study. Our 
impression is that ERM’s dealings 
with its local counterparts, the 
Nicaraguan government and the 
company that owns the canal 
concession have been mutually 
transparent and professional. 

Contrary to the depiction 
of the proposed canal route 
by Meyer and Huete-Pérez as 
a pristine wilderness, human 
impacts are strongly evident over 
its entire length, particularly 
from agriculture. This includes 
nationally and internationally 
protected areas and Lake 
Nicaragua, where several fish 
species are already in decline 
(T. B. Thorson Fisheries 7, 2-10; 
1982; and M. T. McDavitt Shark 
News 14, 5; 2002). 

We share many of the authors’ 
concerns for environmental 
integrity and biodiversity along 
the proposed canal route. 
However, there were huge losses 
to these even before the canal 
project began, and this needs to 
be factored into the discussion. 
Jeffrey K. McCrary* Nicaraguan 
Foundation for Integral 
Community Development, 
Managua, Nicaragua. 
jmecrary2@yahoo.com 
*On behalf of 4 correspondents (see 
go.nature.com/nu2cj for full list). 


Offsets: factor failure 
into protected areas 


Martine Maron and colleagues 
assume that a nation’s 
commitment to establishing 
protected areas of biodiversity 
provides a suitable baseline for 
determining the “additionality” 
of any offset initiative based on 
habitat protection (Nature 523, 
401-403; 2015). The evidence 


indicates otherwise. 

A more realistic baseline 
would factor in the high 
probability that national 
biodiversity commitments will 
not be fulfilled (see M. Walpole 
et al. Science 325, 1503-1504; 
2009). For example, national 
conservation commitments can 
be overridden by development 
commitments. 

Documented trends and local 
conditions should be used to 
establish a baseline. Carbon 
offsets, for example, commonly 
derive baselines from historical 
average deforestation (see go. 
nature.com/rvdx3x). These 
baselines are typically revised 
every ten years. 

We also disagree that 
developing countries should 
withdraw from the Convention 
on Biological Diversity (CBD) 
if they are unable to fund 
protected areas, because that 
would stop them engaging with 
other CBD targets. Moreover, 
honest accounting of offset 
benefits must occur at the 
local, regional and landscape 
levels where conservation is 
accomplished. 

What is most needed in 
offset programmes is better 
enforcement, so that they do 
not become a ‘licence to trash’ 
(see A. Villaroya et al. PLoS 
ONE 9, e107144; 2014). 

Joseph M. Kiesecker The Nature 
Conservancy, Fort Collins, 
Colorado, USA. 

Bruce McKenney The Nature 
Conservancy, Charlottesville, 
Virginia, USA. 

Peter Kareiva University of 
California, Los Angeles, USA. 
jkiesecker@tnc.org 


Galaxy y-ray signal 
was not oversold 


We argue that Jan Conrad’s 
depiction of our preprint (http:// 
arxiv.org/abs/1503.02320; 2015) 
as a case study in ‘crying wolf’ 
lacks accuracy and credibility 
(Nature 523, 27-28; 2015). 
Based on public data from 
NASAs Fermi Large Area 


Telescope (LAT), we reported 

a y-ray signal from the dwarf 
galaxy Reticulum II. Conrad 
characterizes our work as “the 
latest dark-matter discovery 
claim” and criticizes the “misuse” 
of public data at a time when 

an update from the Fermi 
collaboration “was imminent”. 

Nowhere do we claim to 
have discovered dark matter. 
Rather, our paper is devoted 
to quantifying the probability 
that the observed signal is due 
to random fluctuation. Our 
closing paragraph says “it would 
be premature to conclude [the 
signal] has a dark matter origin’, 
then identifies future work 
necessary to establish such a 
discovery. 

Our use of public data is 
concordant with the principles 
of ‘reproducibility Conrad 
invokes. Nevertheless, he 
compares our work unfavourably 
to a paper by the Fermi-LAT 
and Dark Energy Survey (DES) 
collaborations, who calculate a 
larger probability of background 
fluctuation (see preprint at 
http://arxiv.org/abs/1503.02632; 
2015 and A. Drlica-Wagner 
et al. Astrophys. J. 809, L4; 2015). 
Conrad did not disclose that he 
was initially an author on their 
submitted paper. He states that 
the Fermi-LAT/DES result is 
based on “more comprehensive 
re-analysis of the same data’; 
however, theirs is a separate 
analysis of different data that 
were released 15 weeks after 
both papers appeared, 
preventing confirmation of 
their results in the interim. 
Meanwhile, only our result was 
reproducible (see, for example, 
D. Hooper and T. Linden, 
preprint at http://arxiv.org/ 
abs/1503.06209; 2015). 

Moreover, our findings are now 
published in the peer-reviewed 
literature (Phys. Rev. Lett. 115, 
081101; 2015). 

Alex Geringer-Sameth* 
Carnegie Mellon University, 
Pittsburgh, Pennsylvania, USA. 
alexgs@cmu.edu 

*On behalf of 7 correspondents (see 
go.nature.com/n6gont for full list). 
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For News & Views online, go to 
nature.com/newsandviews 


NEURODEGENERATION 


Problems at the nuclear pore 


Expansion ofa repetitive DNA sequence is associated with neurodegeneration. Three studies identify genes involved in 
nuclear import and export that can mediate the toxicity this expansion causes. SEE ARTICLE P.56 & LETTER P.129 


BENNETT W. FOX & RANDAL S. TIBBETTS 


dvances in molecular genetics and 
Aves technology are transforming 

our understanding of disease. Such 
progress is desperately needed in amyotrophic 
lateral sclerosis, a paralysing neurodegenera- 
tive disease that is almost uniformly fatal. It 
is therefore welcome news that three studies 
(two in this issue’? and one in Nature Neuro- 
science’) have converged on a molecular mech- 
anism that seems to underlie a familial form of 
the disease. 

Amyotrophic lateral sclerosis (ALS) is 
typically sporadic, but around 10% of cases 
are familial. Although mutations in more 
than a dozen genes can be involved, those in 
C9ORF72 are by far the most common cause of 
ALS, being responsible for approximately 40% 
of familial cases*. Mutations in COORF72 occur 
in a section of DNA comprising six tandemly 
repeated bases: four guanines (G) followed by 
two cytosines (C). This G,C, hexanucleotide 
sequence is typically repeated two or three 
times, but can be expanded to tens or even 
thousands of repeats in people with C9ORF72- 
associated ALS (C9-ALS)°”*. 

There are two leading models proposing 
how G,C,-hexanucleotide-repeat expansion 
(HRE) leads to neurodegeneration. One, the 
toxic RNA model, posits that G,C, RNAs tran- 
scribed from the expansion bind to crucial 
RNA-binding proteins or other cellular factors, 
which prevents them from functioning nor- 
mally. The other model proposes that, through 
an unusual form of translation, these expanded 
RNA molecules produce toxic dipeptide repeat 
proteins (DPRs) — such as strings of glycine 
and arginine (GR) amino acids, or of proline 
and arginine (PR). Central to both hypoth- 
eses is the fact that RNAs harbouring the HRE 
assemble into G-quadruplex structures that 
confer abnormal molecular behaviours’. There 
is experimental evidence to support both 
hypotheses* ”, although it is unclear whether 
DPRs are expressed at sufficient levels to con- 
tribute to toxicity”. 

In the first of the three studies, Zhang et al.’ 
(page 56) engineered Drosophila melanogaster 
fruit flies to express HREs of 30 repeats 
(termed (G,C,),0) in the flies’ eyes. G,C, expan- 
sion in this system causes neurodegeneration 
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Figure 1 | Models of G,C,-mediated neurotoxicity. The neurodegenerative disease amyotrophic lateral 
sclerosis can be caused by the expansion of a repetitive six-base DNA sequence (GGGGCC; G,C,) in 

the gene C9ORF72. Transcription of this GC, hexanucleotide repeat expansion (HRE) produces RNA 
that assembles into harmful G-quadruplex structures that can remain in the nucleus or be exported to 

the cytoplasm. Three studies'~* provide evidence suggesting that the G,C, HRE RNA may cause cell 
toxicity in several ways: by binding to target proteins in nuclear-pore complexes in the nucleus, thereby 
disrupting nuclear export of RNA; by binding to the nuclear-import protein RanGAP outside the nucleus, 
preventing normal nuclear import; and by undergoing an abnormal form of translation to form toxic 
dipeptide repeat proteins (DPRs) that interfere with nuclear import, possibly through interactions with 
karyopherin proteins. Other effects of G,C, HRE at both the RNA and protein levels remain unclear. 


and results in a ‘rough-eye’ trait that can be 
scored to identify mutations in other genes 
that lessen or worsen toxicity. Reasoning that 
proteins that bind to the G,C, RNA are prob- 
ably mediators of G,C,-associated toxicity, the 
authors crossed flies expressing (G,C,)3) with 
flies harbouring mutations in genes encod- 
ing G,C,-binding proteins identified through 
previous biochemical screens””®. Mutations 
that activated a gene called RanGAP strongly 
suppressed rough eye and neurodegenera- 
tion in (G,C,)3 flies. The RanGAP protein, 
which binds to G,C, RNA, is located on the 
cytoplasmic face of the nuclear membrane, 
and is part of one of around 2,000 nuclear-pore 
complexes that control the flow of proteins and 
RNA in and out of the nucleus. Zhang and col- 
leagues then identified several other nuclear- 
import genes involved in (G,C,),9-elicited 
neurotoxicity. 
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Freibaum et al.” (page 129) reached similar 
conclusions using a different strategy. They 
crossed flies expressing a (G,C,);, HRE with 
flies missing defined chromosomal segments 
spanning from tens of genes to more than a 
hundred, and, using an iterative approach, 
homed in on genes whose deletion altered 
G,C, toxicity. The authors identified numer- 
ous nuclear-import factors whose inactiva- 
tion worsened the rough-eye trait of (G,C,)sg 
flies. The comprehensive screen revealed 
that changes to nuclear-export proteins also 
enhanced neurodegeneration and rough eye, 
suggesting that alterations in both nuclear 
import and export contribute to G,C,-medi- 
ated toxicity, at least in their system. 

Both groups then probed exactly how 
G,C, HREs disrupt nuclear trafficking. 
Zhang et al. showed that RanGAP bound 
G,C, HREs in vitro, and corroborated this 


finding in vivo using neurons taken from 
patients with C9-ALS. HRE expression dis- 
rupted the nuclear import of fluorescent test 
substrates and of normal nuclear proteins — 
most notably TDP-43, which forms mis- 
folded aggregates in the degenerating motor 
neurons of most people with ALS. Freibaum 
and colleagues observed nuclear-membrane 
irregularities in HRE-expressing cells and 
demonstrated that G,C, HREs inhibit nuclear 
RNA export, an effect that was relieved by 
reducing the expression of genes that sup- 
pressed G,C,-mediated toxicity. Together, 
these findings established a strong connec- 
tion between defective nuclear trafficking and 
neurodegeneration (Fig. 1). 

Do DPRs contribute to G,C, toxicity? Both 
studies detected G,C,-derived DPRs, but 
neither could show whether these DPRs con- 
tributed significantly to toxicity. In the third 
study discussed here, Jovici¢ et al.’ addressed 
this point directly, performing a genetic screen 
to identify genes that lessened or worsened 
the toxicity caused by a PR;, DPR in the yeast 
Saccharomyces cerevisiae. Because PR, was 
expressed from synthetic DNA and not from 
a G,C, HRE, toxicity should derive from the 
DPR itself, rather than from its parent RNA. 
Six of the strongest suppressors of PRso- 
associated toxicity in the researchers’ screen 
encoded members of the karyopherin family 
of nuclear-import proteins. The screen also 
suggested that the genesis of ribosomes (the 
cellular machinery that produces proteins) 
goes awry in PR.,-expressing yeast. In the 
future, even more leads are likely to be mined 
from these genetic data. 

These three studies take us to a higher plane 
of understanding of C9ORF72-associated 
ALS, with a focus placed squarely on the 
nuclear pore. For the future, the newly iden- 
tified toxicity-suppressing genes will need 
to be tested in mammalian models of G,C, 
expansion and DPR toxicity, probably using 
recently developed mouse strains”. The find- 
ings also raise the question of whether nuclear- 
trafficking defects contribute to neurotoxicity 
in other types of ALS. Neurons have a limited 
ability to replace damaged nuclear-pore com- 
plexes, and age-dependent decreases in nuclear 
integrity have been postulated as a risk factor 
for ageing-related disease”. Thus, enhancers 
of nuclear import should be tested in other 
ALS models, particularly those in which 
TDP-43 aggregation is observed. 

The genetic studies have not resolved 
whether one mechanism of toxicity predomi- 
nates in C9-ALS. At face value, the data suggest 
that G,C,-containing RNAs and G,C,-derived 
DPRs elicit toxicity through an overlapping set 
of nuclear-pore proteins. However, it remains 
possible that DPRs contribute to neuro- 
toxicity directly in flies. This question could be 
answered by investigating whether the nuclear- 
import enhancers picked up in the G,C, 
screens can rescue neurodegeneration in flies 


expressing toxic DPRs. It will also be important 
to further characterize G,C,-RanGAP inter- 
actions, and to determine whether DPRs bind 
nuclear-pore proteins. Finally, because both 
DPRs and G,C, HREs reportedly disrupt a 
subnuclear structure called the nucleolus!*”’, 
the relationship between this mechanism 
and nuclear-membrane defects should be 
deciphered. 

Can our understanding of toxic G,C, 
RNA be leveraged for therapy? Zhang et al. 
reversed the rough-eye trait by feeding 
(G,C,)39 flies either a compound that disrupts 
G,C,-RanGAP binding or a small-molecule 
inhibitor of nuclear export. The three stud- 
ies also identified other genes that may be 
‘druggable; including those encoding pro- 
teins that oppose RanGAP. Development and 
preclinical testing of modulators of nuclear 
import or export is certainly warranted. 
No doubt, genetic studies such as the three 
discussed here will identify other nodes of 
therapeutic interest. m 
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Frictionless fluids from 
bacterial teamwork 


By increasing the sensitivity of an established technique, researchers have 
shown that swimming bacteria can make frictionless fluids — with potential 
applications in areas such as microfluidics. 


M. CRISTINA MARCHETTI 


he viscosity of a liquid is a measure of 
its resistance to flow. In general, denser 
fluids are more viscous and require 
more energy to get them to flow through 
a pipe. Flow with no energy dissipation is a 
hallmark of exotic states of matter such as 
superfluidity and superconductivity. Key to 
these exotic states are quantum effects that 
dominate at ultralow temperatures — turn- 
ing liquid helium, for example, into a super- 
fluid that flows without friction through 
cracks as thin as molecules. Writing in Physi- 
cal Review Letters, Lopez et al.' demonstrate 
that Escherichia coli bacteria swimming in 
a fluid can organize themselves to counter- 
balance the energy loss resulting from viscous 
dissipation and thereby dramatically lower 
the fluid’s viscosity, driving it to vanish or even 
to become negative. 
In 2004, it was predicted’ that unicellular 
swimming organisms could substantially 


change the viscosity of a fluid on the basis 
of a hydrodynamic theory’ of active fluids 
(liquids consisting of self-propelled particles). 
This suggestion was confirmed by numeri- 
cal solutions*”* of the theory, which revealed 
the possibility of vanishing viscosity for 
suspensions of motile bacteria. Pioneer- 
ing experiments subsequently confirmed a 
reduction of viscosity in suspensions of the 
bacteria Bacillus subtilis® and E. coli’. A con- 
current study® demonstrated the sensitivity of 
this effect to the microscopic cellular-propul- 
sion mechanism by revealing an increase in 
viscosity for dilute concentrations of the alga 
Chlamydomonas reinhardtii; however, this 
behaviour remains puzzling. 

Detailed calculations” of the response of 
dilute suspensions of swimmers to an exter- 
nally imposed shear flow (which induces the 
velocity profile shown in Fig. 1a) have pro- 
vided quantitative expressions for viscosity 
changes for small volume fractions of bacteria. 
Demonstrating that a bacterial suspension can 
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Figure 1 | Viscosity modulation by rod-like bacteria. a, Shear flow (blue 
arrows) can be imposed on a fluid between two plates by applying equal and 
opposite forces (black arrows) on the plates. Rod-like particles suspended in 
the liquid respond by orienting their long axes along the direction in which 
the flow tends to ‘stretch’ the fluid. b, Bacteria such as Escherichia coli are 
‘pusher’ swimmers, which use their rear flagella appendages for propulsion. 


achieve a state of vanishing, or even negative, 
viscosity was previously impossible, however, 
because this requires measurements of tiny 
shear stresses. 

Lopez and colleagues overcame this 
problem by adapting an old-fashioned 
rheometer — a device used to measure fluid 
viscosity. A simple rheometer consists of inner 
and outer cylinders that can rotate relative to 
each other. A fluid is placed in the annulus 
between the cylinders and one of the cylin- 
ders is rotated at a set rate, shearing the fluid. 
The liquid drags the other cylinder, exerting 
a torque on it. From measurements of this 
torque, one can infer the shear stress and thus 
the fluid viscosity, defined as the ratio of stress 
to the applied shear rate. 

The authors modified this device by control- 
ling the rotation of the inner cylinder using a 
computerized feedback mechanism. This 
set-up maintains zero torque, allowing highly 
sensitive measurements of ultralow shear 
stresses. The researchers also suspended 
bacteria in a medium that allows the microbes 
to remain motile but not to divide, enabling 
control of the bacterial concentration. 

Importantly, Lopez et al. were able to 
demonstrate the existence of states of arbitrarily 
small viscosity, in a regime in which the viscos- 
ity did not depend on the imposed shear rate 
and was therefore a legitimate material prop- 
erty. The development of a macroscopic device 
capable of sensing the rheological response of 
microorganisms is a remarkable experimental 
achievement. It paves the way for the quanti- 
tative characterization of the flow behaviour 
of a wide range of microorganisms, and for 
understanding the role of different propulsion 
mechanisms. 

Suspending non-motile particles in a 
fluid always increases the fluid’s viscosity. 


even negative, viscosity. 


Albert Einstein provided the first quantitative 
formulation of this intuitive effect in 1906 by 
showing that, in a dilute suspension of spheres, 
the increase in viscosity is linearly propor- 
tional to the volume fraction of suspended 
particles’. So how do swimming bacteria 
achieve the opposite effect and thin out the 
suspension, turning it into a frictionless liquid 
akin to a superfluid? 

The answer relies on two key properties of 
flowing suspensions. First, inactive rod-like 
particles in an externally imposed shear flow 
align their long axes along the direction in 
which the flow ‘stretches’ the fluid. The rods 
tilt at a fixed angle that depends on their ratio 
of length to width; this angle can be close to 
45° for long, slender rods (Fig. 1a). Many uni- 
cellular organisms, including E. coli, have such 
a rod-like shape and therefore orient in this 
way in shear flow. 

Second, swimming bacteria exert forces 
on the surrounding fluids. These forces come 
in equal and opposite pairs: the force from 
the beating of their propulsive appendages 
(flagella or cilia) is balanced by the viscous 
drag on the cell’s body. The spatial profile of 
these forces depends on the propulsion mecha- 
nism. Most bacteria use appendages mounted 
at the back of their bodies, and are known as 
pushers. When they move, they push fluid 
out at their front and back, while sucking it in 
at the sides. 

Elongated pushers thus align their bodies 
along the stretching axis of the external 
flow and generate additional flows that 
further stretch the fluid in the same direction 
(Fig. 1b). Athigh enough concentrations, con- 
tinuum theories suggest that the bacteria act 
collectively to push the fluid along, effectively 
thinning it. A microscopic understanding of 
how bacteria coordinate their response to 
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Pushers exert equal and opposite forces (black arrows) at their front and 

tail. In an external shear flow (blue arrows), they orient along the stretching 
direction of the flow and generate flow fields (green arrows) that further push 
the fluid in the same direction as the shear flow, reducing the fluid’s viscosity. 
Lopez et al.' report that suspensions of E. coli can generate fluids with zero, or 


shear to achieve a state of frictionless flow is 
still lacking. However, Lopez and co-workers’ 
experiments demonstrate that the viscosity of 
asuspension of swimming bacteria can indeed 
decrease with an increasing volume fraction 
of swimmers, within a range of bacterial 
concentrations. 

By showing that bacteria can completely 
compensate for fluid friction by allowing the 
fluid to flow with zero dissipation, the authors 
have demonstrated that it is possible, in princi- 
ple, to extract useful macroscopic mechanical 
power from bacterial activity. This obser- 
vation is in line with earlier findings“ that 
bacteria can work together to turn micro- 
gears. Although harnessing bacterial power 
for macroscopic energy generation may still 
be a dream, it is not such a stretch to imagine 
that bacteria could be used as mixers to thin 
and stir the flow in capillary and microfluidic 
devices. Quantitative characterizations of 
rheology of the type pioneered by Lopez et al. 
pave the way to the development of bacterial 
baths tailored to mix and flow liquids for 
specific applications. = 
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| ECOLOGY | 
Global trends in plant 
naturalization 


Many naturalized non-native plants pose ecological and economic threats. 
A quantitative analysis of the global distribution of naturalized plants confirms 
some anticipated trends and exposes new patterns. SEE LETTER P.100 


MARCEL REJMANEK 


aturalized species are non-native 
| \ | species that form self-sustaining pop- 
ulations following their introduction 
into an area by human agency’. Some natural- 
ized species are considered a major threat to 
biodiversity and have been the focus of many 
biologists over the past three decades. How- 
ever, even casual observers may notice that the 
distribution of naturalized species is highly 
uneven within and among different regions. 
Attempts to summarize global geographical 
distributions of naturalized organisms have 
included birds’, ungulates’ (large mammals 
such as pigs and camels) and bryophytes* 
(non-vascular plants), but a comprehensive 
assessment of naturalized vascular plants has 
been missing. In this issue, van Kleunen et al.” 
(page 100) provide the first global analysis of 
the numbers and distributions of naturalized 
vascular plants and their exchange between 
continents. 

The authors used hundreds of data sources 
of various kinds to characterize the alien flo- 
ras of 843 non-overlapping regions worldwide 
(481 mainland and 362 island areas). Charac- 
terization included the origin of the naturalized 
species and estimates of the numbers of native 
and non-native species per continent. The 
resulting database includes 13,168 plant species 
— 3.9% of the world’s currently known vascular 
flora — that have become naturalized in at least 
one region. The authors suggest that this figure 
may be an underestimate, given the lack of data 
(or adequate data) for some regions. 

One of the most striking results of this study 
comes from the authors’ comparisons between 
large continental areas. These revealed that 
North America has accumulated the largest 
number of naturalized species of vascular 
plant (5,958), followed by Europe (4,140). This 
finding undoubtedly reflects more intensive 


introduction processes — both deliberate, 
for example for ornamental horticulture and 
erosion control, and accidental, as a result of 
frequent trade between these regions and the 
rest of the world. 

Simple numbers of naturalized species do 
not, however, quantify the actual level of inva- 
sion. Previous work’ has shown that, in North 
America, non-native species account for 51.3% 
of the 120 most widely distributed plant spe- 
cies, but account for only 2.1% in Europe. 
One possible explanation for the striking dif- 
ference between Europe and North America 
is that the European flora, being part of the 
Eurasian flora, has been exposed to countless 
plant migrations over time, so that the result- 
ing plant communities are less ‘naive’ and more 
resistant to new plant invasions. It is also likely 
that some European plant species have been 
selected for quick colonization of human- 
disturbed habitats, the habitats in which they 
are most often found naturalized in North 
America (Fig. 1). 

Van Kleunen and colleagues’ data also show 
that the Pacific Islands region exhibits the 
steepest increase in the cumulative number of 
naturalized species with respect to the total area 
involved. This result provides the first global 
verification of an expected pattern: that oceanic 
islands harbour more naturalized plant species 
than mainland areas of similar size. A primary 
reason for this may be that native communi- 
ties on islands represent only a limited sample 
of the species that could potentially match the 
habitat, and they are therefore more open to the 
naturalization of introduced species. 

At the same time, the data confirm previous 
preliminary analyses showing that continental 
regions with large tropical areas (Africa, South 
America, tropical Asia) have fewer naturalized 
plant species than predominantly temper- 
ate regions. Higher resistance to non-native 
species establishment, faster vegetation 
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50 Years Ago 


It is probable that only those who 
have themselves been concerned 
with scientific research will 
appreciate all the fine nuances of 
Sir Cyril [Hinshelwood]’s address, 
but the picture he paints of the 
scientist as a creative worker, of the 
need for freedom of expression and 
appropriate conditions of work, and 
of public understanding if his work 
is to be fully effective, is intelligible 
to any layman. It is no picture ofa 
scientist working and living in some 
‘ivory tower; or even of Thomson's 
Newton, “stemming alone vast 
eternity’s unbounded sea’, but 
rather of a happy voyager of strange 
seas of thought, in company with 
others trained in the same or many 
other disciplines. 

From Nature 4 September 1965 


100 Years Ago 


In his presidential address, read 

at the Association of Museums, 

San Francisco, Dr. O. C. Farrington 
gave an able summary of the 

origin and evolution of natural 
history museums, which should be 
widely read in this country. More 
especially is this to be urged in 
view of the danger which threatens 
such institutions in the immediate 
future in regard to the policy of 
national retrenchment which is 
now in process of formation. There 
is a danger that the pruning-hook 
may be used too ruthlessly, thereby 
inflicting material harm. For 
reformers are generally enthusiasts, 
and therefore are to be carefully 
watched, experience having shown 
that a sense of proportion is not 
usually among their attributes. 
Museums, as he remarks, are even 
now commonly regarded as a 
luxury, but he leaves no uncertainty 
as to the vitally important part 
which the modern museum plays, 
and must continue to play, in ever- 
increasing force, in our national life. 
From Nature 2 September 1915 
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Figure 1 | A naturalized problem. Hypericum canariense L., a species of 

St John’s wort, is native to the Canary Islands, but has been introduced to and 
become naturalized in California (left). Almost no plants can grow beneath 
stands of this naturalized shrub (right). Van Kleunen and colleagues’ analysis” 


recovery following disturbance and lower 
introduction intensity in the tropical regions 
are the most common hypotheses explaining 
this phenomenon. 

Another finding of van Kleunen and 
colleagues’ study is that the continents of the 
Northern Hemisphere have been the major 
sources of naturalized plant species for many 
areas of the Southern Hemisphere, but not vice 
versa. Thus, the study finally quantifies what 
Charles Darwin anticipated on the basis of the 
observations he made on his voyage around the 
world. Darwin proposed’: “I suspect that this 
preponderant migration from north to south 
is due to the greater extent of land in the north, 
and to the northern forms having existed in 
their own homes in greater numbers, and 
having consequently been advanced through 
natural selection and competition to a higher 
stage of perfection or dominating power, than 
the southern forms.” But conclusively testing 
this hypothesis will be difficult, if not impos- 
sible — hundreds, at least, of phylogenetically 
related pairs of species from the Northern 
and Southern Hemispheres would have to be 
tested in well-designed competition experi- 
ments. Alternatively, the fact that the South- 
ern Hemisphere is currently underrepresented 
as a source of naturalized vascular plants may 
indicate that southern continents provide 
many species that could spread to the northern 
continents in the future. 

The quality of the data used for this synthesis 
could be much improved — for example, the 
definition of naturalized species is not always 
clear and consistent across different sources 
of data; not all sources are equally reliable; 
and the botany of some regions is much 
less well known than others. Nevertheless, 
it seems that van Kleunen and colleagues’ 
major qualitative conclusions are robust. Two 
straightforward generalizations emerge from 
this and other studies: with an increasing 


number of introduced species, the number 
of naturalized species increases*; and with an 
increasing number of naturalized species, the 
number of potentially harmful species also 
increases’. Moreover, the number of natural- 
ized plant species and their combined cover”? 
are usually positively correlated (at least in 
small plots in North America). It is prob- 
able that the data collected in this study will 
be used for testing many interesting hypo- 
theses’ and will improve our predictions 
of the future distributions of naturalized 
plant species”. = 
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SUPERCONDUCTIVITY 


of naturalized plants across the globe reveal that California is a region with one 
of the highest numbers of naturalized species. Most of them are incorporated 
into native, mostly highly human-disturbed, biotic communities without any 
obvious impacts. However, some, like H. canariense, are highly influential. 
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Extraordinarily 
conventional 


Attitudes to high-temperature superconductivity have swung from disbelief 
toa conviction that it occurs only ‘unconventionally’. But conventional 
superconductivity is now reported at record high temperatures. SEE LETTER P.73 


IGOR 1. MAZIN 


Onnes was puzzled to observe’ that 
mercury became an ideal conductor below 
4.2 kelvin. How could all the electrons in a 
metal cooperate so as to carry electric cur- 
rent without resistance? Common wisdom 


IE 1911, the physicist Heike Kamelingh 
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dictates that there is nothing ideal in this 
world. Nobody’s perfect! No crystals without 
defects can be created, no wheel can roll with- 
out friction, no glass can be 100% transpar- 
ent. Yet subsequent experiments confirmed 
that the resistivity of many metals suddenly 
drops to exactly zero at a sufficiently low 
temperature. Chaotic motion introduced by 
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heat destroys electronic cooperation, so for 
many years it was believed that this phenom- 
enon, now known as superconductivity, is 
limited to ultra-low temperatures. But on 
page 73 of this issue, Drozdov and colleagues” 
report a superconductor that works at about 
200 K — a temperature that actually exists on 
Earth’s surface. 

For decades, physicists were in the dark 
about the origin of superconductivity. The 
discovery in 1938 of another cooperative 
phenomenon’ — superfluidity in helium at 
2 K — offered the first clue*. This complete 
lack of viscosity turned out to be a direct con- 
sequence of quantum mechanics. All quantum 
particles are characterized by a ‘spi number; 
if this is an integer (as for helium-4 atoms), 
the particles can combine into a single object 
so large that it cannot be disturbed by such 
nuisances as friction or viscosity. This effect 
is called Bose-Einstein condensation. 

But electrons, which conduct electricity, 
have a spin of +, and so are not subject to 
Bose-Einstein condensation. In 1957, John 
Bardeen, Leon Cooper and Bob Schrieffer 
therefore proposed that the interaction of 
electrons with metal ions creates an attrac- 
tive interaction that forces the electrons 
to combine in pairs*. These ‘Cooper pairs’ 
have a net spin of zero, and can form a Bose- 
Einstein condensate. This theory also allows 
the transition temperature, T., below which 
superconductivity occurs for a given metal, 
to be estimated. 

Most elemental superconductors had been 
discovered by 1957, and all had T. values of 
less than 10 K. For the next two decades, sci- 
entists worked with various compounds, but 
failed to increase T. by even a factor of three. 
Not surprisingly, most physicists began to 
believe that nature imposes a fundamental, 
but as-yet unexplained, T, limit of 25-30 K. 
The problem was succinctly formulated by 
the materials scientist Bernd Matthias in 1964: 
“Why has it been relatively easy, within the last 
10 years, to reach transition temperatures of 
17 to 18 Kin many intermetallic systems and 
impossible to raise this value even by as little 
as half a degree?”® Eight years later, Marvin 
Cohen and Phillip Anderson pointed out that 
if electrons interact too strongly with the ions 
in a metal, they can break the lattice apart’. 
On this basis, they estimated that the highest 
T, for conventional superconductors (those 
driven by the electron-ion interaction) is 
approximately 30 K. 

Although the argument seemed convincing, 
some physicists remained hesitant. In the 
early 1970s, Vitaly Ginzburg — one of the top 
theorists of the time — organized a group in 
Moscow to explore routes to high-temperature 
superconductivity. One of his team’s principal 
results was that a key assumption by Cohen 
and Anderson was flawed, and that T. could, 
in principle, be arbitrarily high even in a 
conventional superconductor’. 


Another prominent physicist who did 
not subscribe to the idea of a universal limit 
was Neil Ashcroft. In the late 1960s, he’ and 
Ginzburg” proposed that, if hydrogen could 
become metallic, the energy of its ionic vibra- 
tions would be so high that even a moderately 
strong electron-ion coupling could result in a 
rather high T.. Unfortunately, metallization of 
hydrogen has proved to be extremely difficult. 
It was then pointed out that hydrogen-rich 
compounds might be better targets'”’, but it 
is only now that this idea has been realized, as 
reported by Drozdov and colleagues. 

In the meantime, three major breakthroughs 
occurred in superconductivity. First, cuprate 
superconductors were discovered!’ in 1986; 
within seven years, the T. for these compounds 
reached 133 K (ref. 14). These have been 
recognized as ‘unconventional’ superconduc- 
tors, driven by interactions among electrons, 
rather than by electron-ion interactions. 

The second was the discovery’, in 2001, of 
magnesium diboride — a conventional super- 
conductor whose T. is 40 K. This relatively high 
number is due to the low mass of boron, and 
to the fact that strong electron-ion coupling 
is ensured because 
the conducting 
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to be considerably 
more complex than 
for other conventional superconductors known 
at the time, but was understood within a year 
of the original discovery. At last, theorists could 
accurately calculate the critical temperature ofa 
rather complicated material. This encouraged 
scientists to seek quantitative predictions for 
new superconducting materials. 

The third breakthrough was the discovery'® 
of iron-based superconductors in 2008. 
These materials seem to be unconventional 
and, although of great interest, have never 
surpassed the T. of the cuprates. 

Drozdov and co-workers report a fourth 
breakthrough: superconductivity at approxi- 
mately 200 K in a hydrogen-rich compound, 
sulfur hydride, at about 90 gigapascals — a 
pressure hardly achievable just a few years ago. 
Not only is this a 50% increase over the previ- 
ous record for T., but the authors convincingly 
argue that the observed superconductivity is 
conventional, vindicating the ideas of Ashcroft 
and Ginzburg. 

Moreover, this is the first time that a 
previously unknown material predicted to be 
a high-temperature superconductor has been 
experimentally confirmed to be one. A com- 
putational study” of hydrogen-rich materials 
under pressure had reported that the sulfur 
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hydride H,S would be a superconductor with 
T, in the range 190-200 K at 200 gigapascals — 
very close to the now-reported experimental 
value. Drozdov et al. studied H,S, but it seems 
that at high pressure this decomposes into 
elemental sulfur and hydrogen-rich H,S. 
It is therefore highly likely that the super- 
conducting material is H,S. More-accurate 
calculations'*” yielded a T, value approxi- 
mately 20% higher than in the earlier compu- 
tational study’’. There is some disagreement 
about which small effects, not accounted for 
in standard computations, are responsible 
for this overestimate, but it is amazing that 
theorists quibble about a 20% inaccuracy in 
first-principles calculations when even an order- 
of-magnitude estimation was considered 
practically impossible only 40 years ago. 

In 1796, the philosopher Wilhelm Hegel 
introduced the concept of spiral progress: an 
intellectual proposition is superseded by its 
negation, but later the negation itself is negated; 
the original thesis is then reinstated, but at a 
higher level of development. The generality of 
this concept can be philosophically disputed, 
but Hegel’s idea seems to be confirmed by the 
fact that the holy grail of superconductors has 
been discovered in the same group of materi- 
als as the first known superconductors, after a 
tiresome quest along exotic routes. m 
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Location affects 


sporulation 


Monitored changes in the number of copies of a gene during DNA replication 
control the timing of sporulation in bacteria. This discovery links replication to 
the concept that a gene’s location on a chromosome can influence cell traits. 


BETH A. LAZAZZERA & DIARMAID HUGHES 


location of a gene on its chromosome 

can influence the level at which it is 
expressed. Most bacterial chromosomes are 
circular, and their replication begins at a single 
bidirectional origin. As such, during chromo- 
some replication, genes close to the origin of 
replication will be transiently present in more 
copies (present at a higher dosage) than those 
close to the terminus of replication. Altering 
the distance of a gene from the origin of repli- 
cation systematically alters its level of expres- 
sion during the cell’s replication cycle**. But 
until now, the significance of gene location has 
largely focused on whether highly expressed 
genes are preferentially located in the origin- 
proximal half of the chromosome, because 
this provides the cell with a growth advan- 
tage due to a positive gene-dosage effect’. 
Writing in Cell, Narula et al.® report a new twist 
on the role of chromosomal location in gene 
function, in coordinating sporulation with 
chromosome replication in the bacterium 
Bacillus subtilis. 

When starved, B. subtilis can initiate a 
cascade of protein phosphorylation that leads 
to sporulation, producing a dormant spore 
that is resistant to starvation conditions and 
that can eventually resume growth under 


in decades it has been known! that the 
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Figure 1 | A genetic imbalance regulates sporulation. The bacterium 
Bacillus subtilis sporulates by activating a phosphorylation cascade, which 
begins with phosphorylation of the protein SpoOF by the enzyme KinA. a, The 
spoOF gene is located close to the origin of DNA replication on the B. subtilis 
chromosome, whereas kinA is close to the replication terminus. During 
replication, the cellular concentration of SpoOF transiently increases relative to 
the level of KinA, owing to a difference in gene copy number. Narula et al.° 


favourable conditions. The first components 
of this phosphorelay are a kinase enzyme 
called KinA anda response-regulator protein, 
SpoO0F. In vitro evidence’ has suggested that, 
although phosphorylation of Spo0F by KinA 
is necessary for the activation of early sporula- 
tion genes, high concentrations of Spo0F can 
also inhibit the activity of KinA. Narula et al. 
confirmed this result in vivo, demonstrating 
that high levels of SpoOF induce a negative- 
feedback loop that inhibits the phosphorelay. 

The spoOF gene is located near the origin of 
replication, whereas the kinA gene is located 
close to the replication terminus. Narula 
et al. report that the positions of spo0F and 
kinA seem to be crucial for their ability to 
efficiently regulate sporulation. Because of 
their respective locations, during replication 
there is a temporary twofold increase in the 
dosage of spo0F relative to kinA (Fig. 1a). By 
using computer simulations and then verifying 
their models in vivo, the authors showed that 
the transient increase in SpoOF concentration 
inhibits KinA until replication is completed, 
leading to pulsing dynamics of sporula- 
tion-gene expression during each cell cycle 
(Fig. 1b). Cells will only sporulate once 
they cross a threshold level of sporulation- 
gene expression, which is achieved through 
a positive-feedback loop that increases 
levels of KinA concentration — a process 
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that takes several rounds of cell division*. 

Narula and colleagues then performed 
translocation experiments, in which they 
moved spo0F or kinA towards the terminus 
or origin of replication, respectively. These 
translocations abolished pulsing, confirming 
that a transient imbalance in the dosage of 
the two genes is necessary for pulsing of early 
sporulation-gene expression and for proper 
coordination of the sporulation program with 
DNA replication. These data, together with the 
authors’ finding that the relative locations of 
kinA and spoOF are similar in 45 other species 
of sporulating bacteria, show for the first time 
that the siting of interacting genes at differ- 
ent locations on the chromosome could have 
evolved as a way of controlling how the gene 
products function. 

Monitoring chromosome replication status 
is crucial for many species. In the case of 
B. subtilis, initiation of sporulation without 
complete chromosomes for both the mother 
cell and the future spore cell would be a waste 
of resources. It has long been known’ that a 
checkpoint is activated to inhibit sporulation 
when DNA is damaged or replication is defec- 
tive. Narula and colleagues have identified 
a remarkably simple mechanism by which 
cells can monitor the replication status of the 
chromosome. 

The regulatory mechanism presented in 
this study deepens our understanding of the 
potential variety of mechanisms that might 
regulate changes in cellular traits. But the 
work also raises several interesting avenues 
for further investigation. For example, it is 
unclear whether this particular situation is a 
biological one-off. It seems more likely that 
there are other traits, both in B. subtilis and 
in other organisms, that are regulated by 
temporal variations in gene-product ratios 
associated with gene location. 

It also remains to be seen whether more- 
complex versions of this mechanism exist, 
involving more than two genes, and whether 
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report that this discrepancy induces a negative-feedback pathway that 
inactivates sporulation genes. Once replication is complete, the disparity is 
resolved, and sporulation-gene-expression pathways are activated. b, As such, 
sporulation genes are activated in pulses during sequential cell cycles. Green 
indicates that the chromosome is fully replicated, background colour indicates 
partial replication. Once a threshold level of expression is reached (dotted 
line), sporulation occurs. 


such mechanisms could be involved in 
replication fidelity. For instance, could this 
type of regulatory mechanism act as a brake on 
chromosomal rearrangements such as 
inversions, which might disrupt the relative 
locations of genes in regulatory networks that 
rely on dosage imbalances? 

Narula and colleagues’ work illustrates 
the potential importance of gene location in 
perhaps unexpected aspects of cell biology. 
It will doubtless motivate future experiments 
in chromosome remodelling. Perhaps it will 
also prompt a re-examination of old data, 
to assess whether arbitrary choices made 
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in genetic engineering might have affected 
experimental outcomes. m 
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The diversified 
economics of soil water 


Soil water that evaporates or is tapped by plants is largely separate from that 
which runs into streams and recharges groundwater. This finding has big 
implications for our understanding of water cycling. SEE LETTER P.91 


GABRIEL BOWEN 


oils can be viewed as the investment 

managers of the terrestrial water cycle: 

they accept precipitation capital from 
the atmosphere and allocate it to sustain and 
grow various biological and hydrological 
stocks. These water investments influence 
plant productivity, run-off to streams and 
groundwater, and atmospheric humidity, so 
deciphering how soils partition water is vital 
if we are to understand and predict the func- 
tion of these systems. On page 91 of this issue, 
Evaristo et al.’ suggest that the allocation of 
water in most soils worldwide follows a con- 
servative, diversified ‘strategy, in which new 
resources are invested as they are obtained, 
and transfer of capital between accounts 
is limited. 

Water researchers have considered two 
contrasting scenarios for soil-water allocation. 
The ‘commingled’ scenario, which is the one 
most widely adopted in hydrological models 
(see refs 2 and 3, for example), assumes that all 
water is held ina common pool and is with- 
drawn only as needed. Residual water from 
past precipitation is tapped by plants, evapo- 
rates or hosts biogeochemical reactions until 
fresh precipitation displaces some or all of it 
into groundwater reservoirs or streams. This 
situation has been referred to as hydrological 
connectivity, because water leaving the soil in 
any form is drawn from a common pool and is 
connected to all other flows. 

The contrasting scenario could be said to be 
‘diversified, because new water is allocated to 


one of several pools as it enters the soil, and 
transfers between these pools are limited. 
Previous hydrological research has provided 
hints of a diversified approach to soil-water 
investment. For example, in many soils a sub- 
stantial fraction of infiltrating water moves 
rapidly through large pores to recharge 
groundwater and produce stream run-off*. 
There is surprising evidence that this separa- 
tion of soil-water pools extends to water with- 
drawal by plants, with trees and shrubs in two 
ecological systems”° drawing froma soil-water 
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pool that is apparently distinct from that 
feeding recharge and run-off. The generality 
and importance of such diversification have 
been unclear, however, because relatively few 
studies have been conducted. 

Evaristo and colleagues provide compelling 
evidence that the diversified mode is wide- 
spread, if not ubiquitous. The authors adapt 
previously reported methods” that capitalize 
on a distinctive shift in the ratios of hydrogen 
isotopes and oxygen isotopes in soil water as 
it evaporates from soils. If the water in soil, 
plants, groundwater and streams all showed 
a common evaporation shift, this would 
strongly suggest a commingled situation 
(Fig. 1a). But in a meta-analysis of data from 
47 studies spanning multiple environments 
and biomes, the authors instead find a similar 
evaporation shift in soil and plant water, but 
little or no shift in streams and groundwater 
(Fig. 1b). This implies diversification: plants 
and soil evaporation across ecosystems seem 
to be tapping a pool of water that is largely 
separate from the pools that generate run-off 
and recharge. 

This finding has major implications 
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Figure 1 | Evaporation shifts in two models of soil-water allocation. When the ratios of the abundances 
of hydrogen and oxygen isotopes in soil water are plotted against each other, different data distribution 
patterns are expected, depending on how the water is partitioned for use. a, In the ‘commingled’ scenario, 
soil water is held in a common pool. The distributions associated with water used for different purposes 
— water tapped by plants, run-off to streams and groundwater, or evaporated water — are shifted by 
similar amounts from the distribution associated with precipitation. b, In the ‘diversified’ scenario, 

new water is allocated to one of several pools as it enters the soil, and transfers between these pools are 
limited. Evaristo and colleagues’ analysis’ of soil-water data suggests that a diversified mode dominates 
worldwide: the distributions associated with plant water and evaporated water are shifted by similar 
amounts from the precipitation values, but the distribution associated with run-off is not shifted by much. 
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for — and raises many questions about — 
our understanding of water cycling. At a 
fundamental level, the diversified mode is 
conservative in that water is allocated to 
support multiple uses, with less water avail- 
able than in the commingled model to support 
run-off or plant growth individually. The 
mechanisms that maintain the segregation 
of these pools of water as they move through 
the soil matrix remain unresolved, but under- 
standing them is crucial for developing accu- 
rate models of soil-water partitioning. In 
particular, the relative roles of physical and 
temporal segregation remain unclear. Do 
plants draw water from different parts of the 
soil matrix from groundwater recharge, or do 
plant withdrawals happen at a different time 
from groundwater recharge? 

The suggestion that plants that span many 
biomes have developed strategies of water use 
that focus on one pool of soil water largely 
to the exclusion of another is intriguing, 
given that drought stress is a major driver of 
plant mortality’. Perhaps the explanation 
lies in the transient nature of the soil-water 
stocks that contribute to recharge and run-off. 
Evaristo and co-workers’ observation of the 
pervasive pattern of association between plant 
and soil water, but not with run-off-generating 
water, calls for further research. However, 
the isotopic methods used by the authors 
are less useful in the study of non-woody 
species such as grasses, so extending this 
work to some fast-growing and potentially 
less-discriminative water users will require 
new approaches. 

The lack of water exchange between soil 
pools also calls into question some previous 
analyses of water-cycle processes (see ref. 8, for 
example), because it implies that methods for 
studying water partitioning that use measure- 
ments of chemical or isotope tracers in streams 
may be blind to the part of the soil-water bal- 
ance sheet that involves plants and soil evapo- 
ration. Indeed, a study” published earlier this 
year found that balancing the global water- 
isotope budget requires widespread hydro- 
logical separation in soils, consistent with 
Evaristo and colleagues’ results, and called 
for a revision of previous global flux estimates 
from studies that did not consider hydrologic 
separation. 

Finally, water allocation is inextricably 
coupled with soil biogeochemical reactions, 
from rock weathering to nutrient cycling, and 
the effect of diversified soil-water allocation on 
these processes may be enormous. Many reac- 
tions occur in thin films of water surround- 
ing mineral grains. If these films are part of 
a long-lived soil-water pool, and there is little 
physical exchange of this water with through- 
flowing, run-off-generating water, what are 
the implications for the transfer of the reac- 
tion products to groundwater and streams? 
In this sense, the ‘trickle-down effects of soil- 
water economics may structure the entire soil 


chemical system. Better understanding of the 
processes governing soil-water partitioning 
may ultimately help to resolve long-standing 
problems in geochemistry — such as discrep- 
ancies between field- and laboratory-based 
mineral-weathering rates’, which are central 
to our understanding of the global carbon 
cycle and to proposed geoengineering strate- 
gies for coping with anthropogenic emissions 
of carbon dioxide. = 
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Unequal opportunity 
during class switching 


The DNA breakage-and-repair mechanism that generates antibodies of different 
classes has, in theory, 250% chance of occurring correctly. But this recombination 
turns out to be heavily biased towards productive events. SEE LETTER P.134 


JAVIER M. DI NOIA 


ntibodies are proteins that recognize 
and neutralize invading pathogens. 
To accomplish this, antibodies (also 
called immunoglobulins) must access differ- 
ent tissues and recruit killer molecules and 
immune cells. These effector functions depend 
on the antibody’s ‘constant’ region, which 
varies in protein sequence between the differ- 
ent classes of immunoglobulin — IgM, IgG, 
IgA and IgE. To produce antibodies appro- 
priate to a particular infection, the class can 
be changed through an inducible genomic 
rearrangement called class-switch recombi- 
nation (CSR)'. This process can occur in two 
orientations, one of which results in a ‘pro- 
ductive’ rearrangement whereas the other 
prevents antibody production. Theoretically, 
these two events have an equal probability 
of occurring, which would give CSR a 50% 
failure rate that would limit the efficiency of 
antibody responses. But in this issue, Dong 
et al.” (page 134) demonstrate that CSR is 
highly non-random, with 90% of events result- 
ing in a functional rearrangement. 
Antibodies of the IgM class are the first to be 
produced when B cells of the immune system 
are stimulated by encounter with a pathogen. 
As the immune response progresses, the IgM 
constant region is replaced by another one, 
depending on the infection: IgG antibodies 
are effective against viruses and bacteria and 
are the antibodies induced by vaccination; IgA 
antibodies protect mucosal surfaces; and 
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IgE antibodies attack certain parasites. 

The constant regions defining the antibody 
classes are all encoded by the antibody heavy- 
chain gene (Igh), and the sequence for each is 
preceded by a distinct repetitive ‘switch’ (S) 
sequence — Su, Sy, Sa and Se (Fig. 1). Dur- 
ing CSR, the enzyme activation-induced 
deaminase (AID) causes simultaneous DNA 
double-strand breaks (DSBs) at the Su and 
another S region’. The DSBs, which can be 
up to 200 kilobases apart, are then joined by 
non-homologous end joining (NHEJ), a ubiq- 
uitous pathway for repairing broken chromo- 
somes’. For productive CSR, the broken ends 
of the two separate DSBs must be joined in 
the orientation that places the new constant 
region in place of that of IgM and circularizes 
and deletes the intervening sequence (Fig. 1). 
Joining in the other orientation inverts the 
region between the DSBs, inactivating the 
antibody gene. 

By adapting a high-throughput technique* 
that enables analysis of the sequence and 
relative orientation of massive numbers of 
junctions between two DSBs, Dong et al. 
determined the relative frequency of deletion 
compared with inversion after inducing CSR. 
Their results show unambiguously that CSR is 
heavily biased towards the productive orienta- 
tion, which they found in more than 90% of 
the joins between two switch regions broken 
by AID. 

The authors further explored this finding by 
introducing into the genome of antibody-pro- 
ducing B cells sequences that are recognized 
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Figure 1 | Biased recombination. The Igh gene contains a region encoding 
the variable (V) region of an antibody (immunoglobulin) and several 

regions encoding the antibody’s constant (C) regions, the latter each being 
preceded by switch (S) regions. The C region that is expressed determines the 
immunoglobulin class. The default class, IgM (encoded by Cy’) can be changed 
to another class (IgG, IgE or IgA, encoded by Cy, Ce or Ca) through class- 
switch recombination (CSR). In this process, the enzyme AID induces double- 
strand DNA breaks at the Su region and at another S region. The broken 
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DNA is then rejoined through non-homologous end joining (red arrows). If 
the broken strands are oriented such that Cy is replaced by another constant 
region, the intervening DNA will circularize and be excised, and a productive 
antibody-encoding sequence is generated. If the strands are joined in an 
inverted orientation, no antibody is produced. Dong et al.” show that normal 
CSR is more than 90% biased towards the productive orientation. The authors 
present evidence suggesting that the topology of Igh during CSR and the DNA- 
repair factor 53BP1 are the primary contributors to this orientation bias. 


by the enzyme I-Scel (a yeast enzyme that 
induces DSBs in DNA). They then looked at 
the orientation outcomes when one or both 
DSBs were made by I-Scel at a non-switch- 
region sequence, compared with breaks cre- 
ated by AID atan S region. It had been shown 
previously that replacing both S regions with 
I-Scel recognition sites allows CSR follow- 
ing expression of I-Scel°. Dong et al. find that 
I-Scel-initiated recombinations lack orienta- 
tion bias, even when multiple I-Scel sites in 
tandem are used to mimic the repetitive nature 
of S regions. This lack of bias probably con- 
tributes to the relative inefficiency of CSR fol- 
lowing I-Scel-induced DSBs compared with 
normal CSR’. 

By contrast, the authors find that CSR 
between an I-Scel-induced non-S-region 
break and an AID-induced S-region break 
within the [gh gene is more than 75% biased 
towards the productive orientation. This 
result, together with the greater than 90% 
orientation bias of CSR between two AID- 
induced S-region breaks, suggests that the S 
regions and/or AID contribute to enforcing 
the productive orientation. However, although 
seemingly necessary, the S-region and AID 
are not sufficient for this bias — Dong et al. 
found no orientation preference when the 
AID-induced and the I-SceI-induced breaks 
were in two different chromosomes, a set-up 
that mimics AID-dependent chromosomal 
translocations®. 

Previous work had shown that inter- 
chromosomal fusions between an I-Scel- 
induced and a spontaneous DSB are unbiased’. 
Future analysis of junctions between AID- 
induced or I-Scel-induced breaks in various 
genomic locations will make it possible to test 


the authors’ hypothesis that the spatial organi- 
zation of the Igh, in which the S regions are 
brought into contact with each other within 
a physically restrained topological domain 
(Fig. 1), is a key determinant of the orienta- 
tion preference of CSR. A recent paper shows 
that two S regions inserted into another gene 
targeted by AID (the immunoglobulin 
light-chain gene) are not joined to one another 
even if efficiently broken by AID’. Instead, 
the individual DSBs in each S region are 
just rejoined. This observation is consistent 
with a role for the specific topology of Igh in 
promoting the productive joining of breaks 
between S regions. 

Dong et al. obtained further insight into the 
mechanism of CSR by studying cells lacking 
DNA-repair factors that are involved in the 
process. They found that orientation bias was 
reduced in the absence of the enzyme ATM 
kinase, which coordinates the response to 
AID-induced damage’, and in cells lacking the 
DNA-binding proteins H2AX, Rif-1 or 53BP1, 
which act to protect broken ends from being 
resected, thereby promoting NHEJ*’. The 
authors propose that inhibiting end resection 
accentuates an intrinsic predisposition of CSR 
to proceed in a specific orientation, dictated 
by the topology of Igh, by allowing NHEJ to 
repair breaks that are not correctly paired and 
could join in either orientation. However, 
53BP1 prevents end resection by recruiting 
Rif-1 (ref. 9), and Dong et al. find that decreas- 
ing resection in 53BP1-deficient cells did not 
restore the orientation preference, thus reveal- 
ing a resection-independent role for 53BP1 in 
determining orientation. Accordingly, 53BP1 
is required for normal CSR but dispensable 
for I-SceI-mediated CSR°**. Putative 53BP1 


functions include pairing of the S regions and 
influencing the topology of Igh’. Dissecting 
the role of 53BP1 in orientation bias will be 
one of the most interesting challenges arising 
from this work. 

The only other example of orientation- 
biased DNA rearrangement is VDJ recom- 
bination, the process that assembles the 
antibody genes during B-cell development””. 
The mechanisms inducing this bias and that 
of CSR, as revealed by Dong and colleagues, 
are poorly understood and probably differ- 
ent. However, it is unlikely to be a coincidence 
that both have evolved to function in the most 
effective way to ensure the production of 
antibodies and thereby an efficient immune 
response. 
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The quiet revolution of numerical 


weather prediction 


Peter Bauer!, Alan Thorpe! & Gilbert Brunet” 


Advances in numerical weather prediction represent a quiet revolution because they have resulted from a steady 
accumulation of scientific knowledge and technological advances over many years that, with only a few exceptions, 
have not been associated with the aura of fundamental physics breakthroughs. Nonetheless, the impact of numerical 
weather prediction is among the greatest of any area of physical science. As a computational problem, global weather 
prediction is comparable to the simulation of the human brain and of the evolution of the early Universe, and it is 
performed every day at major operational centres across the world. 


posed that the laws of physics could be used to forecast the 

weather; they recognized that predicting the state of the atmo- 
sphere could be treated as an initial value problem of mathematical 
physics, wherein future weather is determined by integrating the gov- 
erning partial differential equations, starting from the observed current 
weather. This proposition, even with the most optimistic interpretation 
of Newtonian determinism, is all the more audacious given that, at that 
time, there were few routine observations of the state of the atmosphere, 
no computers, and little understanding of whether the weather possesses 
any significant degree of predictability. But today, more than 100 years 
later, this paradigm translates into solving daily a system of nonlinear 
differential equations at about half a billion points per time step between 
the initial time and weeks to months ahead, and accounting for dynamic, 
thermodynamic, radiative and chemical processes working on scales 
from hundreds of metres to thousands of kilometres and from seconds 
to weeks. 


Q t the turn of the twentieth century, Abbe’ and Bjerknes* pro- 


A touchstone of scientific knowledge and understanding is the ability 
to predict accurately the outcome of an experiment. In meteorology, this 
translates into the accuracy of the weather forecast. In addition, today’s 
numerical weather predictions also enable the forecaster to assess quan- 
titatively the degree of confidence users should have in any particular 
forecast. This is a story of profound and fundamental scientific success 
built upon the application of the classical laws of physics. Clearly the 
success has required technological acumen as well as scientific advances 
and vision. 

Accurate forecasts save lives, support emergency management and 
mitigation of impacts and prevent economic losses from high-impact 
weather, and they create substantial financial revenue—for example, in 
energy, agriculture, transport and recreational sectors. Their substantial 
benefits far outweigh the costs of investing in the essential scientific 
research, super-computing facilities and satellite and other obser- 
vational programmes that are needed to produce such forecasts’. 

These scientific and technological developments have led to increas- 
ing weather forecast skill over the past 40 years. Importantly, this skill 
can be objectively and quantitatively assessed, as every day we compare 
the forecast with what actually occurs. For example, forecast skill in the 
range from 3 to 10 days ahead has been increasing by about one day per 
decade: today’s 6-day forecast is as accurate as the 5-day forecast ten 
years ago, as shown in Fig. 1. Predictive skill in the Northern and 
Southern hemispheres is almost equal today, thanks to the effective 


use of observational information from satellite data providing global 
coverage. 

More visible to society, however, are extreme events. The unusual 
path and intensification of hurricane Sandy in October 2012 was pre- 
dicted 8 days ahead, the 2010 Russian heat-wave and the 2013 US cold 
spell were forecast with 1-2 weeks lead time, and tropical sea surface 
temperature variability following the El Nino/Southern Oscillation phe- 
nomenon can be predicted 3-4 months ahead. Weather and climate 
prediction skill are intimately linked, because accurate climate predic- 
tion needs a good representation of weather phenomena and their stat- 
istics, as the underlying physical laws apply to all prediction time ranges. 

This Review explains the fundamental scientific basis of numerical 
weather prediction (NWP) before highlighting three areas from which 
the largest benefit in predictive skill has been obtained in the past— 
physical process representation, ensemble forecasting and model initi- 
alization. These are also the areas that present the most challenging 
science questions in the next decade, but the vision of running 
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Figure 1 | A measure of forecast skill at three-, five-, seven- and ten-day 
ranges, computed over the extra-tropical northern and southern 
hemispheres. Forecast skill is the correlation between the forecasts and the 
verifying analysis of the height of the 500-hPa level, expressed as the anomaly 
with respect to the climatological height. Values greater than 60% indicate 
useful forecasts, while those greater than 80% represent a high degree of 
accuracy. The convergence of the curves for Northern Hemisphere (NH) and 
Southern Hemisphere (SH) after 1999 indicates the breakthrough in exploiting 


satellite data through the use of variational data’. 
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global models at 1 km horizontal resolution, thus with an order of 
magnitude greater resolution than today, has added a new dimension, 
as it requires significant investment in high-performance computing 
with as-yet unknown technology. 


The physics of forecasting 
The Navier-Stokes and mass continuity equations (including the effect 
of the Earth’s rotation), together with the first law of thermodynamics 
and the ideal gas law, represent the full set of prognostic equations upon 
which the change in space and time of wind, pressure, density and 
temperature is described in the atmosphere*. These equations have to 
be solved numerically using spatial and temporal discretization because 
of the mathematical intractability of obtaining analytical solutions, and 
this approximation creates a distinction between so-called resolved and 
unresolved scales of motion. Physical processes that operate on unre- 
solved scales down to the molecular enter the equations for the resolved 
scales through source terms for mass, momentum and heat originating 
from friction, moist processes such as condensation and evaporation, 
and radiative heating and cooling. Since these processes are typically 
unresolved they need to be ‘parameterized’ in terms of their interaction 
with the resolved scales. Simplifications can be applied that facilitate the 
numerical solution and reduce somewhat the complexity of the set of 
equations, as demonstrated for the first ttme—even though with limited 
success—by Richardson*. By introducing approximations that accur- 
ately describe the largest scales of motion in the atmosphere, the first 
attempt to use the first electronic computer for weather prediction was 
carried out in Princeton in 1950°. While the Princeton simulations were 
hindcasts, the first real-time forecasts were made in Stockholm in 1954’. 
Only with increasing availability of supercomputing power in the 
1970s was it feasible to solve the full set of equations as proposed by 
Abbe and Bjerknes*. Consequently, various numerical methods of solu- 
tion emerged that addressed numerical stability, accuracy, computational 
speed’ and versatility to deal with more prognostic variables, and the 
interaction between resolved and unresolved scales’®. The main compo- 
nents of these methods are: the representation of spatial variability by the 
choice of spatial discretization, the time stepping method, the treatment of 
boundaries, and the initialization approach"’. This capability has founded 
what we refer to as NWP”. Today, a hierarchy of many models with 
different levels of complexity exists covering the full range between global 
climate projections’, global weather prediction, and local-area modelling 
for high-impact weather" or air-quality prediction”. 


Major steps 
The improvements in the representation of unresolved processes in 
global models, the advent of ensemble methods producing forecast 
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uncertainty estimates, and the introduction of objective analysis tech- 
niques to determine the initial state have led to the predictive skill 
attained today. Representing physical processes, ensemble modelling 
and model initialization are also the key challenges for the future, com- 
bined with technological challenges associated with observations and 
computing, as we will discuss later. 


Physical processes 
Parameterizations capture radiative, convective and diffusive effects in 
the atmosphere and at the interface between the atmosphere and the 
surface, and are often determined by relatively small spatial scales'®””. 
Figure 2 provides an illustration of these processes and where they are 
relevant. Despite not being resolved, these processes drive heat and 
momentum budgets at the grid scale’*'? and are crucial for achieving 
predictive skill. The degree of parameterization and therefore the rep- 
resentation of the basic physics varies significantly for different pro- 
cesses”’. For example, the global model formulation for radiation and 
cloud microphysics processes is similar to that used in regional and 
high-resolution models because the formulation accounts for the basic 
small-scale physics, which is similar across these model spatial scales, 
even if they require added complexity going to higher spatial resolution. 
The formulations are mostly limited by our understanding of physical 
process detail needed for parametric representations that define the 
spatially averaged impact of the process on momentum and heat fluxes. 
On the other hand, deep convection and specific boundary layer pro- 
cesses require a higher degree of parametric formulation as they only 
occur in small fractions of the grid scale; consequently these parameter- 
izations critically depend on which resolution is actually used. 
Parameterizations play a fundamental role in determining predictive 
skill because they determine key aspects of the simulated weather, such 
as clouds and precipitation, as well as temperature and wind. In opera- 
tional NWP models, essentially the same formulation for the parame- 
terizations is used for scales of 10-100 km in short-to-medium range 
forecasts, minimization algorithms used for model initialization, and 
seasonal range forecasts. Achieving this element of ‘grid-scale invari- 
ance’ while including as much physical process detail as possible has 
been a fundamental breakthrough in the recent past. 


Ensemble modelling 

Early in the twentieth century, Poincaré” recognized that forecasts of 
nonlinear systems can vastly differ if small perturbations are applied to 
the initial conditions, and that this difficulty could be fundamental in 
limiting predictive skill. In the 1950s, Thompson performed one of the 
first quantitative estimates of initial errors growing during the forecast”, 
while Lorenz” formulated this understanding more holistically and 


Figure 2 | Physical processes of importance to 
weather prediction. These are not explicitly 
resolved in current NWP models but they are 
represented via parameterizations describing their 
contributions to the resolved scales in terms of 
mass, momentum and heat transfers. 
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founded chaos theory as a result of his attempt to quantify atmospheric 
predictability. From his conclusion—that unstable systems have finite, 
state dependent limits of predictability—was born the need for encap- 
sulating the growth of initial condition uncertainties, their evolution as a 
function of the atmospheric state, and errors introduced by imperfect 
models. The recognition of imperfect forecasts” and determining how 
to calculate analysis and forecast uncertainty using an ensemble 
approach” represent major and unique accomplishments in physical 
sciences. This is particularly true for the prediction of highly variable 
parameters like precipitation (Fig. 3), where ensemble spread quantifies 
forecast uncertainty of rainfall location and intensity and thus provides 
essential information to users. 

The nonlinear complexity of the system means that purely statistical 
methods to assign an uncertainty to the forecast are inadequate. Instead, 
an ensemble of many complete, physical, nonlinear realizations of the 
system is needed”*”, providing a seamless analysis and forecast ensemble 
in which observational information is used to reduce uncertainty. In 
practice, the ensemble members are created using perturbations, equival- 
ent to analysis and model errors, added to the initial state and the model 
physical processes. Determining these perturbations consistently and 
seamlessly so that the ensemble provides a good estimate of uncertainty 
across a wide range of prediction scales is challenging, and the input of 
mathematics and statistical physics expertise was crucially important**”’. 
Weather forecasts today involve an ensemble of numerical weather pre- 
dictions, providing an inherently probabilistic assessment. 


Model initialization 

Early methods for the specification of initial conditions were based on 
the analysis of graphical and synoptic weather charts. Various forms of 
interpolation procedures were later replaced by data assimilation tech- 
niques based on optimum control theory”. The derivation of the current 
state (called the analysis) of the atmosphere and surface is treated as a 
Bayesian inversion problem using observations, prior information from 
short-range forecasts and their uncertainties as constraints as well as the 
forecast model*'**. These calculations, involving a global minimization, 
are performed in four dimensions to produce an analysis that is phys- 
ically consistent in space and time and can deal with huge amounts of 
observational data that are heterogeneously distributed in space and 
time (such as the vast amount and diversity of satellite data used for 
Earth observation since the 1980s). Since initial state uncertainty estima- 
tion is also crucial for ensemble prediction and because data assimilation 
employs both imperfect observations and forecast model, ensemble 
methods have also become an integral part of data assimilation®’, as 
shown in Fig. 4. 

The operational implementation of these four-dimensional variational 
(4D-Var) data assimilation techniques** marks a major milestone in 
operational global NWP. At the European Centre for Medium-Range 
Weather Forecasts (ECMWF) this occurred in 1997*, followed by 
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Figure 4 | Schematic of the ensemble analysis and forecast cycle. Global 
ensemble forecast trajectories, which have been initialized by a previous 
analysis ensemble, are produced over a time window (for example, 09:00-21:00 
utc). These provide estimates of the current weather (first guesses). The 
difference between these forecasts and available observations (shown as data 
points with error bars) is the short-range forecast error. By minimization in 
four dimensions employing variational techniques, improved estimates (4D- 
Var trajectories) are created with reduced distance to observations. The next 
cycle of ensemble forecasts is then initialized from these refined analyses. Image 
courtesy of M. Bonavita (ECMWF). 


Météo-France in 2000°, the Met Office in 2004’, both the Japan 
Meteorological Agency** and Environment Canada in 2005”, and the 
United States Naval Research Laboratory in 2009*°. Development and 
first implementation of 4D-Var took more than 10 years, and further 
research has substantially refined the main ingredients. These were 
the increasing use of satellite radiance data by combining the forecast 
model with computationally efficient radiative transfer models‘, 
the much refined characterization of short-range forecast*’ and obser- 
vation errors** using state dependent weights for each, and better use of 
observations arising from significant improvements of physical 
parameterizations”. 


Predictability and predictive skill 


A continuing and important area of research focuses on the sources of 
predictability in the Earth system. Forecasting future weather is like a 
battleground, with the forces of predictability pitched against those of 
unpredictability. The sources of predictability include large-scale for- 
cing of smaller-scale weather, teleconnections or the chain of predict- 
ability across different geographical areas*®, and the interactions 
between atmosphere, land surfaces and vegetation, sea-ice and ocean 
acting on longer timescales. The sources of unpredictability include 


Figure 3 | Schematic diagram of 36-h ensemble 
forecasts used to estimate the probability of 
precipitation over the UK. A single forecast (red 
frame, centre) is generated by integrating the 
model forward in time from the analysis of initial 
atmospheric state (left). Small perturbations to the 
analysis, within known analysis uncertainty, 
provide an ensemble of forecast solutions, which 
sample the forecast uncertainty (multiple frames). 
These solutions are combined, including some 
spatial neighbourhood sampling, to provide a 
smooth estimate of probability of precipitation 
(right). Image courtesy of K. Mylne (Met Office). 


2 5 154070 85 
Probability of 
precipitation (%) 


3 SEPTEMBER 2015 | VOL 525 | NATURE | 49 


©2015 Macmillan Publishers Limited. All rights reserved 


REVIEW 


instabilities injecting chaotic ‘noise’ at small scales and the upscale pro- 
pagation of their energy, the errors associated with numerical and phys- 
ical approximations, as well as the insufficient number and poor use of 
observations. Box 1 provides an example of such teleconnections and the 
sources of poor forecast performance over Europe in the medium range. 

The outcome of this ‘battle’ can be described as noise growing non- 
linearly during the forecast and thereby leading to fundamental limits of 
how far into the future certain structures can be predicted. The limit for 
small-scale events is between hours and days, for accurate and reliable 
prediction of high-impact weather events about 1-2 weeks, for predic- 
tion of large-scale weather patterns and regime transitions about a 
month, and for global circulation anomalies about a season’’. The longer 
the forecast range the more the predictive skill relates only to anomalies, 
that is, the difference between the state and its modelled climatological 
mean, and the more important space-time averaging becomes to iden- 
tification of the signal. In the short range predictive skill exists for the 
details, while in the long range skill relates to larger-scale structures. 
Predictive capability that is seamless across this wide range of forecast 


BOX | 
Sensitivity of forecasts to initial conditions and error propagation 


horizons is therefore about capturing processes acting on very different 
time and space scales. 

NWP has a fundamental advantage over many other scientific dis- 
ciplines in that its skill is objectively evaluated daily and globally, so that 
success and failure of forecasts is accurately known and pathways to 
improve predictive skill can be effectively tested**°. To evaluate forecast 
skill, metrics such as mean and root-mean-square errors, and the cor- 
relation of the forecast with analysis anomalies of upper-air and surface 
forecast fields are used. In addition, scores targeting more variable para- 
meters such as precipitation®’ exist. Model biases become significant 
further into the forecast range. While biases can be reduced through 
calibration using past forecasts*', the identification of their sources in 
complex models remains one of the dominating challenges for NWP 
and even more so for climate prediction’. Diagnostic methodologies 
employing data assimilation statistics’ can help since the signature of 
most biases is already evident in the analysis and early in the forecast, 
even though their magnitude is small. This approach offers benefits for 
weather and climate science alike. 
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Box 1 Figure | Maps showing the long-range impact of model 
initialization on the European forecast. Panel a shows the day-6 
mean forecast error (the height of the 500 hPa pressure level in 
metres) of the flow at around 5 km height (colour-coded shading), the 
forecast itself (solid isolines) and the verifying analysis (dashed 
isolines) valid on 15 February 2014. Over the western US, the jet 
stream extended far to the south, aligned with a lower level trough. The 
long red arrow indicates the travel path of an atmospheric wave 
disturbance guided by the westerly flow. The presence of a large-scale 
dipole error pattern highlights the lag between forecast and analysed 
state (blue double-headed arrow). The large forecast errors over 
Europe were mostly produced by a phase-shift of the wave that 
increased with time. Back-tracking the wave propagation path 
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identifies the tropical East Pacific (boxed in b) as a likely location of a 
possible forecast error source. This area was characterized by very 
large 24-h forecast errors of upper-level winds because of the paucity 
of wind observations there. When running an experiment where the 
area in the box in bis relaxed towards the analysis rather than evolving 
in the forecast, the strong initial growth of forecast errors is reduced 
and, six days later, the lag of the wave patterns between forecast and 
analysis is reduced over Europe (blue double-headed arrow), 
producing about half of the original forecast errors. This experiment 
demonstrates the long-range impact of model initialization, the 
linkage between tropics and mid-latitudes, and thus shows an 
example of how predictive skill in the one-week time range can be 
increased. 
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As NWP involves an ensemble of forecasts, evaluation metrics need to 
assess the moments of probability distributions such as ensemble mean 
error and the sharpness of the distributions. Forecast reliability is deter- 
mined by comparing forecast distributions with the observed frequency 
of occurrence. Since ensembles are designed to provide valuable 
information on the probability of weather extremes™, scores targeting 
the tails of probability distributions are being developed accounting for 
sparse statistics”. 

In addition, comprehensive feature-based evaluation is available for 
tropical cyclones”® or weather regimes”, and for the evaluation of how 
well models represent the links between lower and higher latitudes***?, 
troposphere and stratosphere’, planetary wave activity driving syn- 
optic scale features”, and synoptic scales interacting with small-scale 
convection®*™ and the surface. 

An effective way to verify predictive skill also arises from combining 
weather with hydrological modelling, whereby predicted river stream- 
flow and discharge help to evaluate predictions of precipitation, run-off 
and storage in NWP models, both for single realization and ensemble 
forecasts’’**. The enhancement of weather models with variables 
describing atmospheric composition such as aerosols and trace gases 
also introduces new ways to evaluate atmospheric evolution by consid- 
ering tracer advection and model chemistry parameterizations”. 


Where we are today 


Operational NWP centres provide predictions from the very short range 
at kilometre scale multiple times per day up to global seasonal forecasts 
at tens of kilometres horizontal resolution once per month. These fore- 
casts relate to the weather but are also extending to air-quality” and 
hydrological’! applications. 

Data assimilation algorithms employ the forecast model and of the 
order of 10 observations per day to derive initial conditions that are 
physically consistent in four dimensions: over the globe, from surface up 
to mesosphere (~80 km) and along time windows from hours to days. 
Operational models are updated frequently to incorporate new science 
that enables improvements in the representation of model physics and 
model uncertainty, in numerical algorithms and observational data 
usage, and to enhance computational efficiency. 

Gauging the relative contributions to success and progress from 
model development, data assimilation algorithms and observational 
data usage is difficult because they are interdependent. More accurate 
model physics means that forecasts compare better with observations 
and facilitate improved data assimilation; in turn this permits ingestion 
of more observations thereby further improving forecasts. 

NWP has also benefited enormously from computing advances. In 
terms of floating point operations, computing power has increased by 
about one order of magnitude every five years since the 1980s. This is the 
result of processor technology advances and more processors being 
used. Intel co-founder Gordon Moore’s law states that computing power 
doubles every 18 months owing to increased transistor density per chip 
and clock speeds. This growth has gone hand-in-hand with the increas- 
ing size of the analysis and forecast computational task in NWP. At 
ECMWFF, the data assimilation performs model integrations in multiple 
stages totalling of the order of 100 iterations across a 12-h window for a 
total of 650 million grid-point calculations. In parallel, about 10 million 
radiance calculations are performed to compare the forecast model with 
satellite observations from more than 60 instruments. Today, the 
ECMWF 16-km highest-resolution model performs calculations on two 
million grid columns with 10-min time stepping over a 10-day period, 
that is, 1,440 time steps. The corresponding ensemble produces 15-30 day 
forecasts with 50 members with a horizontal resolution of 30-60 km and 
30-min time steps. Thus twice per day about 40 billion grid-column 
calculations are performed in about 2.5 h real time. This computing task 
demands some of the largest supercomputing facilities available. 

The time series describing the improving skill of global NWPs is 
impressive (Fig. 1), revealing that while there is some year-to-year vari- 
ability, for more than three decades forecast skill has been advancing 
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continuously’””’. Predictive skill improves at a rate such that useful skill 
is retained one more day into the forecast range for every decade of 
research and development. This steady progress has been the result of 
advances in the science, in the utilization of observations and in super- 
computing capacity. Some of the fluctuations in skill are a result of 
periods when the atmosphere exhibits more or less potential predict- 
ability. This means that certain weather regimes appear to be easier to 
predict accurately further into the future than others. Our understand- 
ing of these regimes of flow is developing and enabling a more discern- 
ing quantification of predictive skill to be developed. 


The future is bright 


The evolution of weather science as well as of high-performance com- 
puting and observing systems in the future is crucial for continuing the 
progress in NWP. Critical scientific and technological cross-roads have 
been reached or are very likely to be reached in the near future. 
Consequently, the present period is of fundamental importance for 
how weather forecasting and also climate science will evolve. Building 
on anticipated advances in the understanding of physical processes, in 
numerical model development, in observation technology and high- 
performance computing, the vision for global weather and climate mod- 
elling a decade or more in the future is as follows: in terms of resolution 
to be able to perform global convection-resolving simulations at a hori- 
zontal resolution of the order of 1 km; in terms of complexity to be 
able to run fully coupled atmosphere-land—-ocean-sea-ice models. 
Ensembles at this resolution and complexity will predict probabilities 
of dynamics, physics, chemistry and probably selected bio-chemical 
processes into the multi-seasonal range for weather, and into the 
multi-decadal range for climate. These global predictions provide essen- 
tial initial and boundary information for finer-scale limited-geograph- 
ical-domain simulations of short-range detailed weather development. 


The scientific challenges 
The main scientific challenges for future global NWP relate to the main 
themes that have produced key advances in the recent past and that have 
brought weather forecasting to the level where it is today: physical 
process parameterization, analysis and forecast uncertainty formulation 
through ensembles, and the provision of physically consistent initial 
conditions for forecasts using observations. There are a number of key 
areas in which substantial progress can be expected in the future that 
also require significant advances compared to current thinking. 
Regarding physical parameterizations, one might anticipate that with 
increasing resolution the need for parameterization would be gradually 
reduced. For radiation and cloud processes and land surface models this 
is a matter of moving current schemes towards fully explicit models 
already used in regional and local applications at the kilometre scale. 
For convection, the situation is more complex because large tropical 
convective clouds or organized convection occur even at currently 
resolved scales (15 km) while embedded small-scale convective plumes 
may not be resolved even at 1 km and will still require parameterization. 
This range of model resolutions with partly resolved convection is also 
referred to as the grey zone, since resolved and parameterized contribu- 
tions to fluxes need to be quantified and combined. Existing schemes 
assume that convection is entirely unresolved and so they are not able to 
adequately represent the impact of both resolved and unresolved process 
components on heat and momentum at resolved scales in the grey zone. 
High-resolution limited-area cloud models have demonstrated that 
the dynamic modes of organized convection can be captured and that 
the modelling of the lifecycle of convection, cloud organization or its 
interaction with large-scale circulation can be improved’*. Whether run- 
ning global models at scales of the order of 1 km also eliminates all 
convection-related uncertainties and produces a fundamental stepping 
stone for reduced model biases and enhanced predictive skill at all forecast 
ranges is not clear at present’’. As these high resolutions are not yet in 
reach, convection parameterizations will remain crucial for global 
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weather and climate modelling for the next decade” and progress in this 
area will require joint efforts in the weather and climate communities’”””*. 

There are two other areas that need much more attention in the future 
and promise significant boosts of skill, but also involve substantial 
investments in scientific development and computing. 

First, the uncertainties inherent to physical parameterizations, either 
from incomplete process understanding or the dilemma of representing 
the impact of unresolved processes on the resolved scales, may require a 
fundamentally different approach. Elements of parameterizations or 
entire schemes are likely to require components that appear statistical 
to the large scales because they are not fully determined by the resolved 
scales”. Examples are stochastic sampling of parameter probability dis- 
tribution functions, stochastically driven sub-cell models, or super- 
parameterizations®*® through embedding entire convection-resolving 
simulations at sub-grid scale. How radical this approach needs to be is 
currently not clear. 

Second, more physical as well as chemical processes will be added. 
More physical processes will be needed because of the modelled coup- 
ling of the atmosphere with ocean, land surface and sea-ice models, 
some of which are already in operational use today*"*’. Each coupling 
has its own characteristic space and time scales and the coupling per se 
provides most benefit beyond the 3-7 day range since ocean, sea-ice and 
land surface processes are relatively slow and mostly affect longer-term 
system memory. However, there are examples where coupling also 
affects the short range: for example, when oceanic upwelling in the wake 
of slowly moving tropical cyclones affects their intensity, or where rain- 
fall over land is strongly constrained by surface evaporation and thus soil 
moisture. 

The greatest scientific challenge for coupling is associated with 
matching fluxes at the interfaces where systematic errors in each com- 
ponent interact** and can produce model shocks and compensating 
changes of mean state at every coupling time step and through feedbacks 
in longer integrations. 

Atmospheric constituents such as trace gases and aerosols directly 
affect radiative heating, but aerosols can also act as condensation nuclei 
in cloud formation and heterogeneous chemistry occurs at the surface of 
polar stratospheric clouds, accelerating ozone destruction. Nevertheless, 
aerosols and trace gases are important to forecast in their own right 
because of their impacts on air quality. An associated challenge from 
adding more physical and chemical processes is that initial conditions 
for these constituents are also required and thus more and complex 
observations need to be assimilated. Ensemble prediction reliability 
beyond the medium range will therefore be enhanced by representing 
the uncertainty of much more complex processes in models and by being 
able to initialize coupled models using much more diverse observations. 

Using more of the existing and new observations, and advances in 
data assimilation pose more science challenges for NWP. Currently, 
each global forecast uses about 5-10% of the total satellite data volume; 
this fraction contains most of the information content for that particular 
forecast. This approach is of fundamental importance to optimally man- 
age the substantial global investment in Earth observation, especially 
from satellites**. However, NWP is limited by insufficient observational 
data. Beyond the maintenance of the backbone satellite and ground- 
based observing systems that measure vertical profiles of temperature, 
moisture, clouds and near-surface weather, fundamental observables are 
missing. An example is the direct observation of upper-level wind with 
Doppler-radar technology*, but this technology is not yet available in 
operational satellite programmes. Wind information is primarily 
needed in the tropics, an area covering around 50% of the Earth and 
where sparse observations are a serious impediment to increased ana- 
lysis accuracy. However, the existing backbone observations also need to 
be provided by a robust and resilient observing system, which requires 
substantial international investment and coordination. A similar level of 
coordination is required for satellite and ground-based observations. 

Notwithstanding the complexity of current data assimilation there are 
many challenges for the future, most of all regarding improved solution 
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algorithms; such algorithms will be targeted at enhancing the exploita- 
tion of new observational data, but will also be able to handle improved 
models. Computational affordability will continue to be a constraint, 
given that a sizeable proportion of the cost of producing a forecast is 
associated with data assimilation. Next-generation data assimilation 
methods will probably employ fundamentally new mathematics, but 
assimilation methods in the near future will probably be based on a 
combination of existing concepts. 

Current algorithms commonly rely on linear principles and vari- 
ational methods, whereas certain components, such as error statistics, 
are obtained from ensembles. The variational principle has been imple- 
mented in different ‘flavours’ and the next decade is likely to be domi- 
nated by either choosing the most effective combination of variational 
and ensemble elements* or by using purely ensemble based methods 
like ensemble Kalman filters*’. Smaller-scale effects operating on shorter 
time scales (for example, convection) may require nonlinear data assim- 
ilation methods for which only limited experimentation with idealized 
models exists**. These are currently difficult to generalize for global 
operational applications. 

Coupled data assimilation will become critical for the initialization of 
the future coupled models*’. This assimilation will need to include 
atmospheric composition (aerosols, trace gases) as well as ocean, land 
surfaces and sea-ice. Each Earth-system component has particular pro- 
cess characteristics and space-time scales, and dealing with those in a 
fully unified data assimilation framework will be extremely challenging. 


Technological challenges 

Today’s highest-performance computers employed in NWP rank in the 
top 20 of the 500 most powerful systems and execute computations at 
petaflop (10'* floating point operations) per second rates, ingesting of 
the order of 100 Mbytes of observational data and producing of the order 
of 10 Tbytes (that is, 10 x 10’? bytes) of model output per day. Future 
generations of global NWP models with kilometre scales in the hori- 
zontal will integrate of the order of 100 prognostic variables over about 
5 X 10° grid points for of the order of 100 ensemble members with time 
steps of seconds in an atmosphere with about 100 levels, coupled to 
surface models of somewhat smaller dimensions. Observational data 
usage will also increase by an order of magnitude owing to the inter- 
nationally coordinated availability of high-resolution spectrometers in 
low-Earth and geostationary orbits with thousands of spectral channels. 

However, the expected future high-performance computing techno- 
logy development will impose new constraints on how to address the 
science challenges. In the past, processor performance has evolved 
according to Moore’s law’, as has memory capacity and processor 
clock-speeds. This trend cannot be expected to continue in the future 
as energy cost has to be reduced. In the future, much more emphasis will 
be placed on parallel computing and this is where the ‘scalability’ of an 
application becomes important, providing time-to-solution gains when 
the model is run on more (and combinations of different types of) 
processors. The gain from the parallel execution of parts of the code is 
limited by the sequentially run elements, which fundamentally limits 
scalability, as does the need to exchange large amounts of data between 
processors. Making NWP codes more scalable is among the top priorit- 
ies in NWP for the next 10 years. 

For NWP centres such as ECMWF, the upper limit for affordable 
power usage may be about 20 MVA (ref. 91). The likely future NWP 
system will be of the order of 100-1,000 times larger as a computational 
task than today’s systems, and would require about 10 times more 
power. Figure 5 illustrates the increase in compute cores and electric 
power supply if model resolution is increased for a single forecast and a 
50-member ensemble, assuming today’s model design and available 
technology. To approach the resolutions of 1-5 km that are considered 
crucial for resolving convection, high-performance computers of unpre- 
cedented dimension and cost (assuming the use of conventional tech- 
nology) would be required. 
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Figure 5 | CPU and power requirements as a function of NWP model 
resolution. Simplified illustration of the number of compute cores (left y-axis) 
and power (in units of megavolt amps, MVA, right y axis) required for 

single 10-day model forecast (lower curves) and 50-member ensemble forecast 
(upper curves) as a function of model resolution, given today’s model code 
and compute technology. The shaded area indicates the range covered when 
assuming perfect scaling (bottom curve) and inefficient scaling (top curve), 
respectively. Today’s single global forecasts operate at around 15 km while 
ensembles have around 30 km resolution. 


A change of paradigm is therefore needed regarding hardware, design 
of codes, and numerical methods”. New technologies will combine and 
integrate low-power processors with the successors of today’s CPUs to 
give the best of both worlds—namely, highly parallel compute perform- 
ance with little data communication at lower clock rates, and CPU-type 
performance with large memory, a fast data interface and higher clock 
rates. Code design and algorithm choice must be adapted to this tech- 
nology by optimizing floating point operation counts and memory 
usage, which is a fundamental challenge given that we are dealing with 
vast heritage codes with millions of lines of instructions. In 10 years, 
global ensemble forecasts will be run on of the order of 10°-10° proces- 
sors. Fault awareness and resilience management will be crucial, given 
the certainty of processor failures and the advent of inexact low-energy 
hardware”. 

This computing challenge is enhanced by the requirements for data 
distribution and archiving. While data growth appears slower than 
compute growth, exabyte (10'* bytes) data production may be reached 
earlier than exaflop computing. Re-computing is even more costly 
than archiving, and thus it is inevitable that the data challenge will 
need to be tackled with high priority”*. As for future processor tech- 
nology, hardware will limit data transfer bandwidth. Occasional hard- 
ware failure needs to be actively accounted for by designing resilient 
storage systems. Such failures also have fundamental implications for 
the design of future work flows. Advanced data compression methods 
need to be implemented, and standardized and supported by the 
weather and climate community. 

Many technological opportunities and challenges will arise from future 
Earth observing systems. At the high end, new satellite instrument tech- 
nology will increasingly move towards hyper-spectral radiometers, with 
thousands of spectral channels sounding the atmospheric thermodyna- 
mical state and composition, together with active instruments (such as 
high-resolution radars and lasers) sounding surface characteristics, aero- 
sols, wind, water vapour, clouds and precipitation. Both instrument cat- 
egories can produce data rates of the order of 100 Gbytes per day that 
require downlinks, pre-processing, data dissemination within a few hours 
and ingestion in forecasting systems. The distribution and archiving of 
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Figure 6 | Key challenge areas for NWP in the future. Advances in forecast 
skill will come from scientific and technological innovation in computing, 
the representation of physical processes in parameterizations, coupling of 
Earth-system components, the use of observations with advanced data 
assimilation algorithms, and the consistent description of uncertainties 
through ensemble methods and how they interact across scales. The ellipses 
show key phenomena relevant for NWP as a function of scales between 10°” 
and 10* km resolved in numerical models and the modelled complexity of 
processes characterizing the small-scale flow up to the fully coupled Earth 
system. The boxes represent scale-complexity regions where the most 
significant challenges for future predictive skill improvement exist. The arrow 
highlights the importance of error propagation across resolution range and 
Earth-system components. 


these data volumes will need to be managed with a similar parallelized 
approach as the model output. Data dissemination will only be feasible if 
compression techniques are applied, potentially accepting ‘information 
loss”°. At the low end, the use of commodity devices, such as mobile 
phones, with good sampling but less accuracy for gathering meteoro- 
logical observations is only starting now, but offers potential for high- 
density observational networks in certain areas”*”’. 

It is clear that scientific and technological challenges are interdepend- 
ent in many areas. The efficiency of computing and data handling 
imposes hard limits on model complexity in weather and climate models 
that are run within tight production schedules, and it will be challenging 
to run globally at 1 km convection-resolving scales. This trade-off 
between scientific and compute performance is not new, but ‘scalability’ 
issues add a new dimension”. 

The quiet revolution of numerical weather prediction has required 
combined scientific, observing and computational technology advances 
to be made. This combination is common to all natural sciences that 
necessitate the solution to large problems, such as simulating the neuro- 
logical connectivity of the human brain or the evolution of the galaxies 
in the cosmos. Further advances require more interdisciplinary research 
at the science-technology interface. As society’s requirement for more 
accurate and reliable information regarding weather and climate grows 
ever more pressing, global numerical models will need to increase in 
both resolution and complexity. This further progress in global NWP 
can be made but will require combined investment in all the elements 
reviewed in this paper”, as summarized schematically in Fig. 6. 
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The C9orf72 repeat expansion disrupts 
nucleocytoplasmic transport 
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The hexanucleotide repeat expansion (HRE) GGGGCC (G,C)) in C9orf72is the most common cause of amyotrophic lateral 
sclerosis (ALS) and frontotemporal dementia (FTD). Recent studies support an HRE RNA gain-of-function mechanism of 
neurotoxicity, and we previously identified protein interactors for the G,C, RNA including RanGAP1. A candidate-based 
genetic screen in Drosophila expressing 30 G4C2 repeats identified RanGAP (Drosophila orthologue of human RanGAP1), 
akey regulator of nucleocytoplasmic transport, as a potent suppressor of neurodegeneration. Enhancing nuclear import 
or suppressing nuclear export of proteins also suppresses neurodegeneration. RanGAP physically interacts with HRE 
RNA and is mislocalized in HRE-expressing flies, neurons from C9orf72 ALS patient-derived induced pluripotent stem 
cells (iPSC-derived neurons), and in C9orf72 ALS patient brain tissue. Nuclear import is impaired as a result of HRE 
expression in the fly model and in C9orf72 iPSC-derived neurons, and these deficits are rescued by small molecules 
and antisense oligonucleotides targeting the HRE G-quadruplexes. Nucleocytoplasmic transport defects may be a 
fundamental pathway for ALS and FTD that is amenable to pharmacotherapeutic intervention. 


The G4C, HRE in the C9orf72 gene is found in as many as 40% of 
familial ALS and FTD cases, with additional reports in other neu- 
rodegenerative diseases’ *. C9orf72 HRE-induced cytotoxicity has 
been proposed to be caused through loss- and gain-of-function 
mechanisms that include: (1) transcribed sense GGGGCC,,, or 
antisense (CCCCGG,,,) RNAs that sequester proteins, thus alter- 
ing their normal function’; or (2) the sense or antisense expanded 
RNAs are translated via repeat-associated non-AUG translation to 
form toxic dipeptide repeat proteins (DPRs)*’. We and others 
have demonstrated that HRE RNA forms hairpin and 
G-quadruplex structures that bind and sequester RNA binding 
proteins (RBPs)*”. 


RanGAP suppresses HRE-mediated toxicity in Drosophila 
We previously identified 19 proteins that exhibit high affinity to GyC, 
relative to a G:C scrambled RNA, along with ~400 additional proteins 
that bind with a moderate affinity to G,C, and/or bind both G,C, and 
G:C scrambled RNA*’. To determine which of these candidate RBPs 
genetically modifies G,C2-mediated neurodegeneration, we performed a 
screen in an established Drosophila model that expresses 30 G4C, repeats 
((G4C2)39) in the fly eye’ (Supplementary Table 1). One of the strongest 
suppressors is a dominant, gain-of-function (GOF) allele of RanGAP, 
called RanGAP*?(GOF), that functions similarly to overexpression of 
wild-type RanGAP (refs 11-13). As shown in Fig. la, 1-day-old flies 
expressing (G4C )39 display subtle ommatidial disorganization defects 
in the eye that worsen when aged for 15 days (Fig. 1b). However, flies 
expressing the same repeats in the heterozygous RanGAP*”(GOF) 
mutant background or with RanGAP overexpression appear normal 
(Fig. la-c and Extended Data Fig. 1), indicating that RanGAP is a 
suppressor of G4C, repeat toxicity. 


As shown in Fig. la, b, wild-type fly eyes have seven photore- 
ceptor neuron (PRN) rhabdomeres per ommatidium. In contrast, 
the PRNs expressing 30 G4C2 repeats show a loss of integrity and/ 
or organization of rhabdomeres at day 15 (Fig. 1a, b, d), suggesting 
age-dependent degeneration. These phenotypes are rescued by 
either heterozygous RanGAP*”(GOF) mutant or RanGAP overex- 
pression. Conversely, knockdown of RanGAP by RNA interference 
(RNAi) significantly enhances the PRN defects (Fig. 1d and 
Extended Data Fig. la). Moreover, RanGAP knockdown-mediated 
enhancement of (G,C )39-mediated degeneration worsens with 
age, with an almost complete loss of rhabdomeres in aged flies 
which is not due to alterations in GyC2mRNA level (Extended 
Data Fig. 1b). These data indicate that RanGAP is a potent sup- 
pressor of G4C2-mediated neurodegeneration in the Drosophila 
eye. 

To determine whether RanGAP also suppresses G4C2-mediated 
toxicity in Drosophila motor neurons, we next analysed the effect 
of RanGAP overexpression on the locomotor function of adult 
flies. Neuronal expression of (G4C,)39 throughout adulthood 
causes flight defects in 15-day-old flies that are rescued with sim- 
ultaneous overexpression of RanGAP (Extended Data Fig. Ic, d). 
Interestingly, when expressed in motor neurons throughout larval 
development using OK371-GAL4, (G4Cz)39 causes severe neuro- 
muscular junction (NMJ) defects including an ~50% reduction 
in active zone number and impaired evoked neurotransmitter 
release not rescued by RanGAP overexpression (Extended Data 
Fig. 2). Together, these data suggest that RanGAP suppresses 
G4C2-mediated neurodegeneration during adulthood, whereas 
(G4C2)39 expression during development causes neurotoxicity that 
is independent of RanGAP. 
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Figure 1 | Genetic interaction between G,C, repeats and nucleocytoplasmic 
transport machinery. a, b, External eye morphology of 1-day-old (a, left 
panels) and 15-day-old (b, left panels) flies. Phalloidin staining of the retina of 
1-day-old (a, middle panels, magnified in right panels) and 15-day-old 

(b, middle panels, magnified in right panels) flies. Wild-type control (top row), 
flies expressing 30 G,C, repeats (middle row), and flies expressing 30 GyC, 
repeats and overexpressing RanGAP (bottom row) are shown. Genotypes: top 
row, GMR-GAL4/+; middle row, GMR-GAL4, UAS-(G4C2)39/+; bottom row, 
GMR-GAL4, UAS-(G4C2)3o/+; UAS-RanGAP/+. c, d, Quantification of 
external morphology (c) and rhabdomere number (d). *P < 0.05; **P < 0.01. 


Nucleocytoplasmic transport modulates G4C, toxicity 
RanGAP functions in the cytoplasm to stimulate Ran GTPase (hereafter 
referred to as Ran) to hydrolyse GIP to GDP, a process required for 
efficient nucleocytoplasmic transport’*'*"*. Proteins larger than 40 kDa 
require active transport to cross the nuclear pore complex (NPC), in 
which their nuclear localization sequence (NLS) and/or nuclear export 
signal (NES) are recognized by carrier protein importins and/or expor- 
tins, respectively’*”*. In the nucleus, exportins bind RaneGTP and cargo 
proteins for export. Nuclear Ran guanine nucleotide exchange factor 
(RanGEF) converts RaneGDP back to RaneGTP”. 

As shown in Fig. 1c, d and Extended Data Fig. 1a, b, overexpression 
of RanGEF enhances G,C>-repeat-mediated degeneration, resulting 
in large necrotic patches and severe rhabdomere degeneration. In 
contrast, knockdown of RanGEF rescues both of these phenotypes. 
These data suggest that RanGEF has an opposite role when compared 
with RanGAP in G,C,-mediated neurodegeneration, consistent with 
their opposing biochemical functions. Overexpression of importin-« 
or knockdown of exportin rescues these phenotypes. Thus, genetically 
enhancing nuclear import or inhibiting export of NLS/NES-contain- 
ing proteins suppresses G4C2-mediated neurodegeneration. 

Expression of arginine-containing DPRs in Drosophila causes severe 
toxicity and poly-glycine-arginine (GR) DPRs are detected in flies 
expressing 36 G4C, repeats under the control of heat-shock-inducible 
GAL4 (hs-GAL4)’. Although we are also able to detect polyGR DPRs in 


ARTICLE 


1/50 input Pull down 
a a ee ee 
mmm RanGAP 1 eaPg<eSegges g $ 
ive} wo ive} 
0 1 2 5 10 20 100 (nM) fF? BSleg lt Fg 
ees 2208 20 @ Z O RanGAPHa 
| 144210 kDa SSS de + + (G,C,),, RNA 
W “i 4 RanGAP1 75 
ws complex 
i Anti-HA 
eee i-co. & 
- =! G-quad 7 
25- 
c Ctrl (G.Ce)s0 d G,C, RNA RanGAP1 MAP2 


RanGAP-HA TO-PRO3 


a 
® 
£ 
5 
°° 
s 
re} 
= 
5 


Ctrl motor cortex 


C9-ALS motor cortex 


Figure 2 | RanGAP binds to G,C, repeats and is mislocalized along with 
NPC components. a, EMSA of human RanGAP1 and repeat RNA in the 
G-quadruplex conformation. b, RanGAP-HA pull down in the absence (lanes 
4-6) or presence (lanes 7-9) of biotinylated GyC, RNA repeats, immunoblotted 
with a HA antibody. Lanes 1-3: 1/50 input. FL, full length. c, Wild-type 
control (left) and G,C, HRE (right) S2 cells expressing RanGAP-HA co- 
stained with an antibody against HA (red) and TO-PRO3 (blue). d, RanGAP1 
co-localization with G,C, RNA foci (dotted box: projected view, high 
magnification: ~0.3 jum single plane) in a C9-ALS iPSC neuron in confocal 
single plane image. e, RanGAP1 immunostaining in non-neurological disease 
control and C9orf72 ALS motor cortex showing intense nuclear localization 
(arrows) and aberrant nuclear aggregates (individual patient identifier in upper 
right corner, Supplementary Table 2). f, Abnormal nuclear localization of 
Nup205 in C9orf72 human motor cortex cells. 


hs-GAL4, UAS-(G4C})30 flies when heat shocked, we are not able to 
detect polyGR or polyGP DPRs when (G4C,)39 is expressed in the eye 
with GMR-GAL4 at the time of PRN degeneration or in adult neurons 
with elavGS (Extended Data Fig. 3a, b). Nonetheless, we cannot exclude 
the possibility that DPRs are expressed at undetectable levels and con- 
tribute to degeneration in the eye. 


G,4C, repeats bind RanGAP and cause NPC pathology 

To determine the relative affinity of RanGAP for GyC, RNA, we 
performed an electrophoretic gel mobility shift assay (EMSA) with 
(G4C)19 RNA and recombinant human RanGAP!1 (Extended Data 
Fig. 4a). The sense (G4C3)19 RNA G-quadruplex shows a concen- 
tration- and length-dependent shift of free RNA to a lower mobility 
RNA-RanGAP1 complex with increasing concentrations of 
RanGAP1 (Fig. 2a and Extended Data Fig. 4b-d). Additionally, 
RanGAP1 demonstrates a higher binding affinity to the sense 
strand G-quadruplex compared to hairpins (Extended Data Fig. 
4b-d), and very little interaction was observed between 
RanGAP1 and (CUG).9. Furthermore, the RanGAP1-(G4C3)19 
complex is resistant to antisense oligonucleotides against the 
G,4C, repeat and nonspecific RNA competitor even at 1,000-fold 
RNA molar excess (Extended Data Fig. 4e). These in vitro results 
indicate that RanGAP1 preferentially binds the sense RNA 
G-quadruplex from the C9orf72 HRE. 
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To confirm that RanGAP interacts with GyC, RNA in cells, we 
expressed carboxy-terminal haemagglutinin (HA)-tagged Drosophila 
RanGAP protein in S2 cells and performed RNA pull down. As shown 
in Fig. 2b, western blot analysis demonstrates that the full-length and 
C-terminal domain of RanGAP physically interact with G4C-repeat 
RNA in Drosophila cells. Both endogenous and transfected RanGAP- 
HA uniformly surrounds the nuclei of control cells (Fig. 2c and 
Extended Data Fig. 5a, b). In contrast, expression of (G4C2)30 leads 
to formation of large RanGAP perinuclear puncta, which is not due to 
induction of apoptosis (Extended Data Fig. 5a). 

In parallel with studies in Drosophila, we investigated RanGAP1 in 
iPSC-derived neurons (hereafter referred to as iPSC neurons) derived 
from multiple C9orf72 ALS (C9-ALS) patients (Extended Data Fig. 6 
and Supplementary Tables 3 and 4). Our iPSC cultures comprise 
30-40% motor neurons (Islet-1*) and are predominantly excitatory 
neurons (vGLUT") and express additional motor neuron markers, 
neural-specific cytoskeletal proteins and synaptic proteins 
(Supplementary Table 3 and Extended Data Fig. 6b, c). Consistent 
with our observations in S2 cells, iPSC neurons from C9-ALS patients 
variably exhibit RanGAP1 puncta (Extended Data Fig. 5c), and 
RanGAP!1 can co-localize with GyC, RNA (Fig. 2d). Notably, to deter- 
mine whether RanGAP1 mislocalization occurs in human disease, we 
analysed brain tissue of C9-ALS patients. Cells in C9-ALS motor 
cortex commonly exhibit mislocalized, discontinuous and large 
punctate RanGAP1 signals compared to smooth perinuclear staining 
observed in controls (Fig. 2e, Extended Data Fig. 7a-d and 
Supplementary Table 2). Similar pathology was not readily observed 
in C9-ALS cerebellar cortex (Extended Data Fig. 7e). Perinuclear 
cytoplasmic RanGAP1 puncta occasionally co-localize with ubiquitin 
(Extended Data Fig. 5e). We next asked whether RanGAP1 puncta 
contain other protein components of the NPC, and therefore stained 
for nucleoporin 205 (Nup205), an extremely long-lived NPC scaffold 
protein”®*”’. We found that RanGAP1 and Nup205 co-localize and are 
predominantly perinuclear in control iPSC neurons and brain tissue 
(Fig. 2f). Interestingly, Nup205 co-localizes with some RanGAP1 
aggregates in C9-ALS iPSC neurons (Extended Data Fig. 5d). 
Consistent with this observation, Nup205 and Nup107 exhibit similar 
motor cortex pathology as RanGAP1 in multiple C9-ALS patients 
(Fig. 2f, Extended Data Fig. 7f and Supplementary Table 2). These 
data suggest that RanGAP1 and additional components of the NPC 
are disrupted in C9-ALS patients. 


The Ran gradient is disrupted by the C9orf72 HRE 


We then tested whether sequestration of RanGAP by the GyC, RNA 
leads to its loss of function. Most Ran protein is imported into the 
nucleus, a process that requires its binding to GDP, but not GTP? 
Hence, defects in RanGAP1 activity might affect the nuclear- 
cytoplasmic (N/C) distribution of Ran. Indeed, we observed a signifi- 
cant reduction in the N/C ratio of Ran in S2 cells expressing (G4C2)30 
(Fig. 3a), suggesting that RanGAP function is impaired. 

Next, we quantified nuclear and cytoplasmic Ran in C9-ALS iPSC 
neurons via immunofluorescence (Fig. 3b-d and Extended Data 
Fig. 8b, c). We observed a significant reduction in the N/C ratio of 
endogenous Ran in the C9-ALS lines tested in mature MAP2* iPSC 
neurons at 50-70 days in vitro (DIV) (Fig. 3c, d). Ran gradient abnor- 
malities were also detected in mature ChAT“ neurons within the same 
cultures (Fig. 3e and Extended Data Fig. 8a). Overexpression of a 
functional Ran-GFP fusion protein in C9-ALS iPSC neurons 
also showed reduced N/C Ran gradients in C9-ALS iPSC neurons 
(Extended Data Fig. 8b, c). 

RanGAP1-GFP overexpression in C9-ALS iPSC neurons rescued 
the disrupted N/C Ran gradient to control levels (Extended Data Fig. 
8d), demonstrating that altered RanGAP1 function contributes to the 
disrupted N/C Ran ratio. The disrupted Ran gradient is not due to 
apoptosis, since treatment of control iPSC neurons with tunicamycin 
does not alter the N/C Ran ratio despite elevating activated caspase 
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Figure 3 | C9orf72 HRE disrupts the nuclear/cytoplasmic Ran gradient. 

a, S2 cells co-transfected with GFP and (G4C2)39 (bottom row) or control (top 
row) and stained with a Ran antibody (red) and TO-PRO3 (blue). Error bars 
indicate s.e.m. b, iPSC neurons from control and C9-ALS patients showing 
mislocalization of Ran to the cytoplasm in C9-ALS iPSC neurons. 

c, Quantification of N/C Ran gradient in neurons from four control and 

four C9-ALS iPS lines when normalized to control. N/C Ran ratio is reduced in 
C9-ALS neurons. Each symbol represents mean of up to 228 neurons per line 
(see Supplementary Table 4). Bar indicates mean N/C Ran of four control or 
C9-ALS lines; error bars indicate s.e.m. d, N/C Ran histogram shows higher 
frequency of lower N/C ratios in four C9-ALS lines as compared to the four 
control lines. N/C ratios are presented as raw values. e, C9-ALS ChAT* 
neurons show similar reduction of N/C Ran. N/C Ran is normalized to controls 
and up to 60 neurons were tested per line (see Supplementary Table 4) 

(**P < 0.01, ****P < 0.0001). Error bars indicate s.e.m. 


3 levels (Extended Data Fig. 8e), and RanGAP1 and Ran mislocaliza- 
tion were not observed in iPSC astrocytes derived from C9-ALS 
patients (Extended Data Figs 6d and 8f-i). Taken together, our fly 
and human iPSC data indicate that the G,C, HRE impairs neuronal 
RanGAP1 function, resulting in higher levels of cytoplasmic Ran 
protein. 


The C9orf72 HRE inhibits import of nuclear proteins 

To determine whether the HRE significantly impairs nuclear import, 
we overexpressed a GFP protein tagged with both a classical NLS and 
a NES (NLS-NES-GFP)” in the Drosophila salivary gland where the 
cytoplasm and nucleus are large and distinct. NLS-NES-GFP is loca- 
lized to both nuclei and cytoplasm of wild-type salivary gland cells 
(Fig. 4a). However, in cells expressing (G4C2)39, the N/C ratio of NLS- 
NES-GFP is severely reduced (Fig. 4a and Extended Data Fig. 9a), 
suggesting that nuclear import is inhibited and/or that nuclear export 
is enhanced. 

Next, we expressed a GFP protein tagged with an NLS and a 
mutated NES, which severely impairs its export activity (ANES)”’. 
In control cells, NLS-ANES-GFP localizes primarily to the nucleus 
(Fig. 4a), whereas in cells expressing (G4C2)30 it localizes predomi- 
nantly to the cytoplasm (Fig. 4a and Extended Data Fig. 9a), support- 
ing an impairment of nuclear import in these cells. Using immunoblot 
analysis, we confirmed that the levels of GFP protein are similar in 
control and (G4C2)39-expressing salivary glands (Extended Data Fig. 
9b). We also detected cytoplasmic mislocalization of NLS-NES-GFP 
and NLS-ANES-GFP in glutamatergic neurons of the ventral nerve 
cord in (G4C)39-expressing flies (Extended Data Fig. 9d), indicating 
that the G,C, HRE also affects nucleocytoplasmic transport in 
Drosophila motor neurons. Therefore, expression of the GyC, HRE 
decreases nuclear import in Drosophila cells in vivo. 
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Figure 4 | C9orf72 HRE causes nucleocytoplasmic transport defects. 

a, Salivary glands expressing NLS-NES-GFP or NLS-ANES-GFP were co- 
stained for GFP, TBPH (red) and nuclei (blue, insets). Scale bars, 20 um. 

b, Representative images of NLS-tdTomato—NES FRAP analysis in control and 
C9-ALS iPSC neurons (control, n = 34; C9-ALS, n = 29). c, Quantification 
of nuclear recovery (FRAP) of two C9-ALS and control iPS lines. Error bars 
indicate s.e.m. d, Representative images of control and C9-ALS iPSC neurons. 
Arrows indicate higher cytoplasmic Ran and TDP-43 signals. e, Quantification 


We next investigated the effects of nuclear import deficits on can- 
didate nuclear NLS- or NES-containing proteins. TDP-43 (TBPH in 
Drosophila), a predominantly nuclear protein, contains both a clas- 
sical NLS and NES, and it is depleted from the nucleus of some CNS 
neurons and glia in most ALS patients and ~45% of FTD patients”. 
Therefore, we hypothesize that its nuclear localization will be affected 
if nuclear import is disrupted. Indeed, loss of nuclear Ran correlates 
with depletion of nuclear TDP-43 in an FTD mouse model”. As 
shown in Fig. 4a and Extended Data Fig. 9c, the N/C ratio of endo- 
genous TBPH is significantly reduced in (G4C2)39-expressing salivary 
gland cells. 

To validate our observations in human neurons, we investigated 
nucleocytoplasmic transport in iPSC neurons by expressing a 
tdTomato reporter with a classical NLS and NES and performing fluor- 
escence recovery after photobleaching (FRAP) of neuronal nuclei””. We 
observed reduced nuclear recovery of NLS-tdTomato-NES in C9-ALS 
iPSC neurons when compared with control lines (Fig. 4b, c). This defect 
was associated with disruption of TDP-43 localization, as C9- ALS iPSC 
neurons exhibit variable, but significantly reduced, N/C ratios for TDP- 
43 (Fig. 4d-f). N/C ratios of Ran and TDP-43 correlate in control and 
C9-ALS iPSC neurons (Fig. 4g), consistent with previous findings that 
the nuclear import of TDP-43 is Ran dependent”*. These data indicate 
that the C9-ALS HRE leads to impaired nuclear import of proteins that 
contain a classical NLS. 


Rescue of HRE-mediated neurodegeneration 


To determine whether antisense oligonucleotides targeting the 
C9orf72 RNA rescue the disrupted N/C Ran ratio observed in C9- 
ALS iPSC neurons, we treated these cells with C9 sense or scrambled 
antisense oligonucleotides used previously*”’*°. The sense strand 
antisense oligonucleotide treatment reduced RNA foci in C9-ALS 
iPSC neurons (Extended Data Fig. 8j) and fully rescued the disrupted 
N/C Ran gradient (Fig. 5a), suggesting that the nucleocytoplasmic 


N/C TDP-43 


of mean N/C ratio of TDP-43 of four control and four C9-ALS lines when 
normalized to controls. Each symbol represents up to 49 neurons per line (see 
Supplementary Table 4). Error bars indicate s.d. f, Histogram shows higher 
frequency of lower N/C TDP-43 ratio. N/C ratios are presented as raw values. 
g, N/C TDP-43 directly correlates with N/C Ran ratio across all lines tested. 
N/C TDP-43 versus N/C Ran: control, P< 0.0001, r° = 0.2980; C9-ALS, 
P<0.0001, r° = 0.1657. *P < 0.05; **P < 0.01; ***P < 0.001. 


transport deficits are due to C9orf72 sense-strand toxicity. Notably, 
when C9-ALS iPSC neurons were treated with these antisense oligo- 
nucleotides, both the N/C Ran and TDP-43 gradients were increased 
(Extended Data Fig. 8k). The antisense oligonucleotide also sup- 
pressed nuclear import defects caused by G4C, repeats in vivo. 
Drosophila larvae co-expressing (G4C2)39 and NLS-ANES-GFP were 
raised on food supplemented with an antisense oligonucleotide 
throughout larval stages, mitigating nuclear mislocalization of NLS- 
ANES-GFP in salivary glands (Fig. 5b). 

RanGAP1 binds the GyC, RNA G-quadruplex in vitro (Fig. 2a). 
Therefore, we then tested whether this interaction can be perturbed by 
a porphyrin compound, TMPyP4, that destabilizes RNA G-quadruplex 
tertiary structures’. TMPyP4 reduces the affinity of RanGAP1 for the 
(G4C)19 G-quadruplex in a dose-dependent manner (Fig. 5c). TMPyP4 
also rescues nuclear import defects in the fly model in a dose-dependent 
manner (Fig. 5d). Thus, inhibition of the G,C, G-quadruplex struc- 
ture significantly suppresses HRE-mediated nuclear import deficits. 
Interestingly, these phenotypes are also suppressed using an exportin 1 
inhibitor, KPT-276 (Fig. 5e), suggesting that inhibiting nuclear 
export may compensate for disrupted import. Notably, antisense 
oligonucleotide, KPT-276, or TMPyP4 treatments all significantly sup- 
press G,C,-mediated neurodegeneration in the eye (Fig. 5f). Hence, 
our data suggest that modulation of nucleocytoplasmic transport pre- 
sents a potential therapeutic strategy for neurodegenerative diseases 
characterized by the C9orf72 HRE. 


Discussion 

Our data demonstrate that the G4C, repeat expansion disrupts 
nucleocytoplasmic transport in a fly model and in human cells 
(Extended Data Fig. 10). While our data suggest that RanGAP is a 
key target of the G,C, repeat expansion, other members of the NPC 
may also interact directly or indirectly with G4C,. Several human 
genetic studies have implicated nuclear transport deficits as the cause 
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Figure 5 | Pharmacological rescue of nucleocytoplasmic transport defects. 
a, Neuronal N/C Ran ratio in control and two C9-ALS iPS lines show increased 
cytoplasmic Ran levels in untreated and scrambled antisense-oligonucleotide- 
treated C9-ALS iPSC neurons (n = 50 neurons per line; see Supplementary 
Table 4). b, Salivary glands of larvae expressing GjC, HRE and NLS-ANES- 
GFP were untreated (top) or treated with 5 LM antisense oligonucleotide (ASO, 
bottom) and co-stained for GFP (green), TO-PRO3 (blue) and antisense 
oligonucleotide (white). N, nuclear; W, whole cell. c, EMSA of RanGAP1 and 


of a rare fetal motor neuron disease and infrequent cases of ALS, 
including studies on the role of the nucleoporin GLE1 implicated in 
mRNA export**™. In addition, irregularities of the nuclear mem- 
brane and distribution of nuclear pore proteins were recently noted 
in sporadic ALS tissue**. An accompanying paper (ref. 36) indepen- 
dently identified additional components of the NPC and nucleocy- 
toplasmic trafficking pathways as dominant modifiers of GyC,; HRE 
toxicity in another C9-ALS fly model. Importantly, the observed 
NPC and nucleocytoplasmic trafficking defects in both iPS-cell- 
derived neurons and motor neurons in our study are relevant to both 
ALS and FTD. Taken together, these studies suggest that products of 
the C9orf72 HRE disrupt nucleocytoplasmic transport at the NPC 
and are a fundamental mechanism for inducing cellular injury in 
ALS and FTD. These defects may account for the nuclear depletion 
and cytoplasmic accumulation of TDP-43 widely seen in C9-ALS 
and FTD. 

Although our data only demonstrate a role for disruption of nuclear 
import in C9-ALS pathogenesis, the robust nuclear pore pathology 
that we detect suggests that both nuclear import and export may be 
affected. It is enticing to speculate that NPC dysfunction leads to age- 
related neurodegeneration, since many of the NPC components, 
including Nup205, are extremely long-lived’®, and NPC integrity is 
lost during normal ageing”’. 
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repeat RNA in the presence of TMPyP4 (top panel) and relative change in 
fraction bound (bottom panel). d, e, Salivary glands of larvae expressing G4Cz 
HRE and NLS-ANES-GFP were treated with different concentrations of 
TMPyP4 (d) or KPT-276 (e) versus vehicle control and co-stained for GFP 
(green) and TO-PRO3 (blue). Scale bars, 20 um. f, The effects of antisense 
oligonucleotide, KPT-276 and TMPyP4 on the external morphology of eyes 
expressing G4C) repeats. *P < 0.05; **P < 0.01. All error bars indicate s.e.m. 


The sense strand appears to be the cause of the described nucleocy- 
toplasmic trafficking deficits in our human and fly model systems, as 
small molecules targeting the sense RNA suppress the nuclear import 
phenotypes, and neurodegeneration is caused by expression of G4C, 
repeat RNA in C9orf72 iPSC neurons or Drosophila. While we cannot 
exclude DPRs asa contributor to nucleocytoplasmic trafficking defects, 
our data in multiple model systems are most consistent with an RNA- 
mediated mechanism. Future studies will be required to determine the 
contribution of RanGAP disruption in C9-ALS pathogenesis com- 
pared with other pathogenic mechanisms implicated in C9-ALS such 
as nucleolar stress, which could act independently or in conjunction 
with nucleocytoplasmic transport disruption’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


For all Drosophila experiments, the experiments were not randomized and the inves- 
tigators were not blinded to allocation during experiments and outcome assessment. 
Investigators were blinded for analysis of all iPSC neuron experiments investigating 
the N/C Ran ratio, N/C TDP-43 ratio, and FRAP live imaging. 

Drosophila genetics. To identify genetic modifiers of GiC, HRE, the candidate- 
based screen was performed as follows: if a candidate RBP* is conserved between 
human and Drosophila, we obtained the RNAi lines against the Drosophila homo- 
logue(s) from the TRiP collection (Supplementary Table 1)**. In addition, if the 
RBP consistently exhibited high affinity to GyC, RNA, we also obtained published 
mutant alleles of their homologues. RanGAP*?(GOF) refers to the RanGAP*? 
‘segregation distortion’ gain-of-function allele’ '*. We recombined GMR-GAL4 
and UAS-(G4C2)39 (ref. 10) and crossed the balanced line, GMR-GAL4, UAS- 
(G4C2)30/CyO, twiGFP, to RNAi or mutant lines. We selected progeny that either 
co-expressed both the repeats and the RNAi (GMR-GAL4, UAS-(G4C2)39/+; 
UAS-RNAi/+, where the UAS-RNAi can be on any chromosome), or expressed 
the repeats in a heterozygous mutant background (GMR-GAL4, UAS-(G4C2)3o/+; 
mut/+, where mut can be on any chromosome). We aged flies for 15 days and 
compared the morphology of their eyes with 15-day-old control flies expressing 
only the repeats. We used a modification index ranging from —4 to 4 to describe the 
relative severity of the morphological defects (Supplementary Table 1), where 0 is 
the repeat-expressing control. A positive number indicates enhancement of the 
phenotype, whereas a negative number indicates rescue. A number of 4 was given if 
the flies have no eyes, whereas a number of —4 was given if eyes appear indistin- 
guishable from that of the wild-type control. If the flies fail to eclose, we indicate the 
phenotype as ‘lethal’. 

In our genetic interaction analyses, we used a previously described method” to 
quantify disruption in the external morphology of the eye, that is, ‘degeneration 
score’. Briefly, points were added if there was complete loss of interommatidial 
bristles, necrotic patches, retinal collapse, loss of ommatidial structure, and/or 
depigmentation of the eye. 

For the subcellular localization of GFP, OK371-GAL4; UAS-(G4C2)39/TM6b, 
Tb, tub::;GAL80 was crossed to UAS-NLS-NES(P12)/TM6b, Tb (III) and non-Tb 
offspring were selected for analysis (NES(P12) is referred to as ANES). OK371- 
GAL4/UAS-NLS-NES-GFP flies were used as a negative control. We did not 
observe any GFP signals in the third instar salivary glands of OK371-GAL4; 
UAS-(G4C2)3o/+ animals. All other fly stocks are from Bloomington 
Drosophila Stock Center, except for the UAS-RanGAP lines generated in this 
study. 

To induce GyC, RNA expression using elavGS (ref. 40), flies were raised at 

29 °C on regular food supplemented with 300 4M RU486. Flies were transferred 
to freshly made food every 2-3 days. 
Quantitative RT-PCR. For each genotype, mRNA was collected from 30 fly heads 
using the TRIzol reagent (Life Technologies) following the manufacturer’s protocol. 
Reverse transcription was performed using SuperScript III First-Strand synthesis kit 
(Life Technologies) following the manufacturer’s protocol. Quantitative PCR was 
performed using SYBR Green PCR system (Applied Biosystem) on a 7900 HT fast 
Real-Time PCR system (Applied Biosystem). The following primers were used. For 
actin: forward 5'-GCGCGGTTACTCTTTCACCA-3’, reverse 5'-ATGTCACGG 
ACGATTTCACG-3’. For GC, repeats: forward 5’-GGGATCTAGCCACCATG 
GAG-3’, reverse 5'- TACCGTCGACTGCAGAGATTC-3’. 

The primers for GC, repeats were designed to amplify a 3’ region immediately 
after the repeats in the UAS construct. 

Flight assay. The flight assay was performed as described"’. Briefly, individual 15- 
day-old female flies were dropped into a graduated cylinder through a hole in its 
lid. The cylinder was graduated into 12 zones of 25 mm each (top: 0; bottom: 12). 
The landing height was noted as the zone number in which the fly landed. 
Electrophysiological recording. For fly third instar larvae, neuromuscular 
(NMJ) recordings were performed from muscle 6 in segments A3 and A4 at room 
temperature in 1.5mM Ca*~ containing HL3 as described”. 

For iPS cells, whole-cell patch-clamp recordings were performed to assess the 
functionality of iPSC neurons. Neurons were perfused in HEPES-buffered 
extracellular solution (143mM NaCl, 5mMKCIl, 2mM CaCl, 1mM MgCh, 
10mM HEPES, 10mM glucose, pH 7.2, 300-310 mOsm) in the presence of 
1M TTX and 20 uM bicuculline. Whole-cell recording pipettes (4-8 MOhm) 
were filled with a Cs-based internal solution (115mM Cs-MeSQ,, 0.4mM 
EGTA, 5mM TEA-Cl, 2.8mM NaCl, 20mM HEPES, 3mM MgATP, 0.5 mM 
Na2GTP, pH 7.2, 290-300 mOsm) for voltage-clamp mEPSC recordings or with 
a K*-based internal solution (2.7mM KCl, 120 mM KMeSO,, 9mM HEPES, 
0.18mM EGTA, 4mM MgATP, 0.3 mM Na2GTP, 20 mM phosphocreatine(Na), 
pH 7.3, 295 mOsm) for current-clamp experiments. Cells were held at —70 mV 
holding potential and recording was performed at room temperature. Signals were 
measured with MultiClamp 700B amplifier and digitized using a Digidata 1440A 


analogue-to-digital board (Molecular Devices). Data acquisition was performed 
with pClamp 10.3 software and digitized at 5 or 20 kHz. 

iPSC generation and differentiation to neurons. Patient fibroblasts were 
collected at Johns Hopkins Hospital with patient’s consent (IRB protocol: 
NA_00021979) as described previously*. iPSC lines were created and initially 
characterized with an NIH-sponsored commercial agreement with iPierian 
(USA) using the 4 vector method. Sox2, Oct4, KIf4 and c-Myc encoding vectors 
were transduced into human fibroblasts using retrovirus delivery. Selected col- 
onies were evaluated for expression of multiple pluripotent markers by quant- 
itative PCR (qPCR) and/or immunocytochemistry. In vitro pluripotency was 
further determined by three germ layer differentiation via embryoid body forma- 
tion. iPSCs were maintained in mTeSR1 (StemCell Technology) and passed once 
a week using dispase (StemCell Technology) following the manufacturer’s 
instructions. Partially differentiated colonies were removed manually before dif- 
ferentiation analyses. The iPSCs were differentiated to neuroprogenitor cells, 
neurons and motor neurons via embryoid body (EB) formation by following 
the methods described previously (Supplementary Table 3)**. At day 32 of dif- 
ferentiation, iPSC neurons were treated with 2014M Ara-C (Sigma) for 48h to 
remove iPS glial progenitor cells and enrich for iPSC neurons. iPSC neuronal 
differentiation was confirmed by class-III Tubulin (Tujl) immunostaining 
(Chemicon AB9354, 1:1,000), and cultures used for subsequent experiments were 
plated onto a confluent layer of mouse astrocytes, and analyses were performed at 
55-69 DIV. Differentiation was assessed by immunofluorescence for the presence 
of MAP2-positive cells (SySy 188 004; 1:1,000) and neuronal morphology. 
Approximately 85-90% of cells were VGlut 1* (SySy 135 303; 1:500), ~10% were 
VGat~ (SySy 131 002; 1:500), ~40% of neuronal cultures were Islet-1 (DSHB, 
40.3A4; 1:50), ~90% were ChAT* (Millipore, AB144P, 1:300). All lines were 
analysed at 50-70 DIV. 

Molecular cloning. RanGAP full-length and/or truncated cDNAs were retrieved 
from cDNA clone LD16356“ and subcloned into pUASt-attB vector* using BglII 
and Not! sites. An HA tag was added at the C terminus. The NLS-tdTomato-NES 
construct was provided by the Hetzer lab (Salk Institute) and was subcloned into 
PrecisionShuttle Lenti vector with C-terminal Myc-DDk tag (OriGen, catalogue 
number PS100064) using Mlul and Xhol cloning sites. 

Transgenic flies. Transgenic flies containing RanGAP cDNA constructs were 
generated by injecting the plasmid into y w;; PBac{yellow[+ ]-attP-3B}VK00033 
(chromosome III) embryos (BestGene, Inc.)**. 

Collection of human autopsied tissue. Human autopsied tissue used for these 
data are described in detail in Supplementary Table 2. The use of human tissue 
and associated decedents’ demographic information was approved by the Johns 
Hopkins University Institutional Review Board and ethics committee (HIPAA 
Form 5 exemption, Application 11-02-10-01RD) and from the Ravitz Laboratory 
(UCSD) through the Target ALS Consortium. 

Antisense oligonucleotide treatment for iPSC neuronal cultures. Modified 2'- 
methoxyethyl (MOE)/DNA antisense oligonucleotides were generated by Isis 
Pharmaceuticals. For antisense oligonucleotide treatment, antisense oligonucleo- 
tides were incubated in neural differentiation media (NDM) at 3 uM then added 
to the iPSC neuronal cultures and replenished every 3 days for a total of 10 DIV 
with antisense oligonucleotides. Sequence for the antisense oligonucleotide- 
577061 targeted upstream of the G4C repeat is “TACAGGCTGCGGTTGTTT 
CC and the scrambled non-targeting control antisense oligonucleotide-141923 
sequence is ‘CCTTCCCTGAAGGTTCCTCC’. 

RNA fluorescent in situ hybridization (FISH) and immunofluorescence. 
RNA-FISH of iPSC neurons was performed as previously described*. Briefly, 5’ 
digoxigenin (DIG)-labelled locked nucleic acid FISH probes used were generated 
by Exiqon and targeted the GGGGCC repeat (CCCCGG, 5) (batch 611635) or a 
non-targeting scrambled probe (300514-04) as a control. Cell cultures were fixed 
in buffered 4% PFA, equilibrated in 1X SSC with 40% formamide, incubated at 
37 °C for 10 min, then incubated with preheated probes (90-95 °C) at 75 °C for 
35 min in 50% formamide and hybridization buffer containing 20 nM of probe. 
Following hybridization, cells were washed 2 with 50% formamide at 55 °C for 
15 min each and then washed with 2X SSC 5 times for 5 min each. 

Cells processed for RNA-FISH and protein immunofluorescence for RanGAP1 
were treated once with Tris-glycine, and processed for standard protein immu- 
nofluorescence. Blocking buffer and immunofluorescence buffer consisted of 
10% and 5% protease and heat-shocked BSA fraction V (Roche) in RNase free 
1X Tris buffered saline, respectively. To detect the DIG-labelled probe, an uncon- 
jugated mouse anti-DIG antibody (Jackson Immunoresearch 200-002-156; 
1:400) and RanGAP1 antibody (Santa Cruz sc-25630; 1:500) was used followed 
by the appropriate secondary antibody (Jackson). Cells then underwent a series of 
5 min washes with immunofluorescence buffer, Tris buffered saline, Tris-glycine, 
PBS with MgCl, and PBS, respectively. Cells were mounted onto slides with 
ProLong Antifade Gold mounting media with DAPI (Invitrogen). 
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Drosophila cell culture. S2 cells were cultured in Schneider’s media supplemen- 
ted with fetal bovine serum and antibiotics at 25°C. The transfections were 
performed using Lipofectamine LTX (Life Technologies) following the manufac- 
turer’s instructions. GyC, repeat mRNA was transcribed using the UAS/GAL4 
system”, driven by Act-GAL4. For immunofluorescent staining, cells were fixed 
48 h after transfection. For actinomycin treatment to induce apoptosis, cells were 
treated with 0.7 UM actinomycin D for 20h. 

Immunofluorescence, phalloidin staining and immunohistochemistry. For 
immunofluorescence staining in Drosophila, tissues or S2 cells were fixed in 
3.7% formaldehyde for 30 min, followed by incubation in PBX solution (PBS 
with 0.4% Triton X-100) for 1h. The tissues or cells were then incubated with 
primary antibodies and 10% normal goat serum (NGS) in PBX for 16h at 4°C. 
Primary antibodies were used at the following concentrations: mouse anti-Brp 
(DSHB), 1:100; mouse anti-GFP (Life technologies), 1:200; rat anti-HA (Roche), 
1:200; rabbit anti-RanGAP (a gift from C. Staber, Stowers Institute), 1:500; mouse 
anti-Ran (BD Biosciences), 1:200; rabbit anti-TBPH (a gift from F. Hirth, King’s 
College)", 1:200; and rabbit anti-antisense oligonucleotide (provided by F. Rigo, 
Isis Pharmaceuticals), 1:1,000. Next, samples were washed in PBX for 8 h at room 
temperature (RT) and then incubated with secondary antibodies conjugated to 
Alexa Fluor 546 or 488 (Life Technologies) in PBX+10% NGS at 4°C for 16h. 
The secondary antibodies were used at a dilution of 1:200. After that, samples are 
washed in PBX for 6 h and then stained with 1 4M TO-PRO3 (Life Technologies) 
for 10 min at RT. 

For phalloidin staining, fixed eyes were incubated in PBX with Alexa Fluor 488 
Phalloidin (Life Technologies) at 1:20 for 16 h at 4 °C. The eyes were then washed 
in PBX for 1h at RT before mounting. 

For immunostaining of iPSC neurons, cells were grown on 12 mm coverglass 

on top of a confluent monolayer of mouse astrocytes fixed in 4% PFA, permea- 
bilized in 0.3% Triton X-100/1X PBS and blocked in 10% normal donkey serum 
before incubation with primary antibody. For human autopsied tissue, paraffin 
motor cortex tissue (see Supplementary Table 2) was washed in xylene (3 
5 min), then a series of 100% ethanol (2X 5 min), 90% ethanol (1X 5 min), 
70% ethanol (1X 5 min) and washed with water (2X 5 min). Antigen retrieval 
was performed using a steamer for 1h in epitope retrieval solution (IHC world) 
then washed with water (3X 5 min). Slides were treated with 50:50 methanol: 
acetone solution for 10min and then washed with 1X PBS (2X 5min). 
Permeabilization was performed with 0.4% Triton X-100/1X PBS (8 min) 
and were then washed (1X PBS) and blocked overnight in 10% normal goat 
serum/1X PBS. Primary antibodies were added and incubated for 24h at 4°C 
(RanGAP1 Santa Cruz, sc-25630, 1:50; Nup205 Novus NBP1-91247, 1:50). For 
DAB staining, tissues were incubated with biotinylated goat-anti-rabbit second- 
ary antibody (Jackson Immunoresearch) at 1:200 for 1 h at room temperature and 
then processed using the Vectastain Kit (Vector Labs) following the manufac- 
turer’s instruction. Cells and tissue were then washed 5 times for 5 min with 1x 
PBS and then 1 time with water and mounted using ProLong Antifade Gold 
with DAPI. 
Microscopy and image analysis. For fly experiments, samples were mounted in 
Vectashield (Vector Laboratories) and analysed under a confocal microscope (model 
LSM510; Carl Zeiss) with its accompanying software using Plan Apochromat 63x, 
NA 1.4 objectives (Carl Zeiss) at RT. Images were captured by AxioCam HRc camera 
(Carl Zeiss) or Hamamatsu Flash 4.0 (Hamamatsu). Images are processed using 
ImageJ (National Institutes of Health). Deconvolution was performed using the 
Tikhonov-Miller method. 

For iPSC experiments, Z-stack images taken on a Zeiss Axioimager with the 
Apotome tool or a Zeiss LSM700 (NIH Grant $10 OD016374) laser scanning 
confocal microscope, all images were taken at matched exposure times or laser 
settings and normalized within their respective experiment. All comparative 
images were processed using identical settings. 

Nuclear/cytoplasmic ratios were quantified using Z-stacks of iPSC neuronal 
cultures on the Zeiss Axioimager with the Apotome tool. Full Z-stacks were taken 
at 0.5 jum intervals and the individual planes were then projected into maximum 
intensity images removing any lower layers that contain the astrocyte monolayer. 
The nuclear region was determined using either DAPI or Lamin-B (Santa Cruz 
sc-6217; 1:300) and the cytoplasmic fraction was determined using MAP2 (SySy 
188 004; 1:1000). Ran was visualized using a Ran antibody (BD Biosciences 
610341; 1:100) or Ran-GFP (OriGene, RC204223L2). Images were quantified 
using ImageJ (NIH) and the mean pixel intensity per jm? was determined to 
generate the nuclear/cytoplasmic ratios. All iPSC lines were imaged at 50-70 DIV. 
FRAP analysis. FRAP analysis was performed as previously described’ with 
modifications. Ctrl or C9-ALS iPSC neurons were transduced with a lenti- 
CMV-NLS-tdTomato-NES construct provided by the Hetzer Laboratory (Salk 
Institute) at 51 DIV. Cells expressing the tdTomato reporter were then imaged on 
an LSM700 and processed with Zen software (Carl Zeiss). Three images were 
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taken of the tdTomato-expressing iPSC neurons at which point the nucleus was 
bleached for 30 iterations of 40-60% laser power and recovery was monitored 
every 3s for 150 intervals. Recovery was normalized to the average of the pre- 
bleached signals. To account for global bleaching, all post-bleach signals were also 
normalized by a ‘bleach factor’ at each time point, which was determined by the 
per cent of signal lost post-bleach in an unbleached transfected cell. 
Purification of recombinant RanGAP1 from Escherichia coli. RanGAP1 
cDNA was provided by S. Blackshaw. The cDNA was cloned into a SspI-digested 
linear pET28a vector in frame with a 6XHis-EGFP N-terminal fusion using 
Gibson assembly cloning strategies (NEB). A 50ml LB starter culture of 
RanGAP1 with ampicillin/chloramphenicol was grown overnight at 37°C. 
Then 25 ml of the overnight starter culture was added to 1 | of pre-warmed LB 
with ampicillin/chloramphenicol and incubated at 37°C until OD¢o0 = 0.7-1 
absorbance (abs). The temperature was then dropped down to 16 °C and protein 
expression was induced with 1 mM IPTG and induced overnight at 16 °C. The cell 
culture was then centrifuged at 4,000g for 20 min. The cell pellet was resuspended 
in 50 ml of resuspension buffer (20 mM HEPES, 200 mM NaCl, 10 mM imida- 
zole, 1mM TCEP, pH 7.4) containing EDTA-free protease inhibitor (Roche). 
Cells were lysed via French press while on ice, and were then centrifuged at 
30,000g for 30 min. The lysed cell supernatant was collected and filtered through 
a 0.45um membrane (Millex-HV Filter Unit) and then loaded onto an 
AKTApurifier 10 superloop at 4°C. A 5ml HisTrap HP Ni Sepharose (GE) 
column was pre-equilibrated with resuspension buffer before the supernatant 
was passed through. The protein was eluted off the column in elution buffer 
(20mM HEPES, 200mM NaCl, 500mM imidazole, 1mM TCEP, pH 7.4). 
Imidazole was removed by passing the protein solution through a HiTrap 
Desalting column (Sephadex G-25 Superfine, GE) pre-equilibrated in Desalting 
buffer (20 mM HEPES, 200 mM NaCl, 1 mM TCEP, pH 7.4). Removal of the His- 
GFP-TEV tag was facilitated by incubating the protein solution with 75 units of 
ProTEV Plus (5 U pl *, Promega) per 2 ml of protein solution overnight while gently 
rocking at 4°C. ProTEV Plus and the cleaved His-GFP-TEV tag were removed by 
reverse-Ni IMAC chromatography. The column flow through was collected and 
flash frozen in liquid nitrogen before storage at — 80 °C. The flow through was then 
checked via SDS-PAGE gel and Coomassie stain to determine the purity of 
RanGAPI. 

Electrophoretic mobility shift assays. A 24mer, 39mer or 60mer RNA (41M) 
containing the sequences (G4C2), (CyG2), or (CUG) with a 5’ Cy5 label (IDT) was 
denatured at 95°C for 5 min and then annealed in the presence or absence of 
100 mM KCl in 10 mM Tris-HCl pH 7.4 to induce the respective formation of 
RNA G-quadruplexes or hairpins. The RNA was diluted to 2 nM in binding buffer 
(HEPES pH 7.5 with 100 mM KCl, 5 mM MgCl,, 50 4M ZnCl, 1 mM TCEP, and 
0.01% IGEPAL) and then incubated for 30 min at room temperature with varying 
concentrations of recombinant RanGAPI (0, 1, 2, 10, 20, 50, 100 and 200 nM) in 
binding buffer. Samples were then loaded onto a 0.8% agarose gel in 1X TAE (pH 
8.0) and electrophoresed for 45min at 60 V. Bands were visualized using a 
Typhoon Image for Cy5 excitation and emissions. The image was analysed and 
quantified in ImageJ and then plotted in GraphPad prism. RanGAP1 binding was 
fit to a hyperbolic and linear regression, and based on the fit of the curve the ky, 
calculated with Byax Set to 1 for nonlinear regression. 

Competition experiments were performed by incubating a final concentration 
of 2nM of RNA with 100nM of RanGAP1 for 30 min at room temperature as 
above. Then unlabelled competitor, antisense oligonucleotide control, antisense 
oligonucleotide, RNaseH-dependent antisense oligonucleotide, or yeast tRNA 
was added to the sample at increasing concentrations, and allowed to incubate 
at room temperature for an additional 30 min. Samples were then analysed 
as above. 

The effects of the porphyrin TMPyP4 on RanGAP1 binding to the RNA was 
performed essentially as above. RNA (2 nM) was incubated with varying con- 
centrations of TMPyP4 that was serially diluted tenfold starting from a 1 1M final 
concentration in binding buffer. After 30 min incubation, 10nM RanGAP1 was 
added and allowed to incubate for an additional 30 min; binding was analysed as 
described above. 

Protein extraction, protein/RNA pull down and immunoblot. Tissues or cells 
were homogenized and/or lysed in RIPA buffer (50mM Tris-HCl pH 7.4, 
150mM NaCl, 0.1% SDS, 0.5% sodium deoxycholate, and 1% Triton X-100) 
supplemented with protease inhibitor cocktail (Complete, Roche). For pull down, 
cells are lysed in lysis buffer (50 mM Tris pH 7.4, 150 mM NaCl, 1% NP-40, and 
5mM MgCl) for 30 min on ice. The lysate was then pre-cleared using avidin- 
agarose beads (Life Technologies) for 30 min before being incubated with bioti- 
nylated GyC,-repeat RNA with 10 mM TCEP and RNase inhibitor RNaseOUT 
(Life Technologies)’. Protein/RNA mixture was then incubated with avidin- 
agarose beads overnight at 4°C. The beads were subsequently precipitated by 
centrifuge at 1,500g for 3 min and washed three times in lysis buffer at 4 °C for a 
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total time of 1h. The beads were then resuspended in 50 ul lysis buffer and 
subjected to immunoblot analysis. 

For immunoblot, the sample was mixed with Laemmli buffer and heated at 
98 °C for 10 min. The protein samples were run on 4-15% SDS Mini-PROTEAN 
TGX Precast Gels (Bio-Rad) and transferred to nitrocellulose membrane. For dot 
blot, 2 ul of sample was blotted on nitrocellulose membrane and air-dried for 
15 min. TBST with 5% milk was used for blocking. Primary antibodies were used 
as below: rat anti-HA (Roche), 1:1,000; chicken anti-GFP (abcam), 1:1,000; 
mouse anti-Actin (Millipore), 1:5,000; rabbit anti-GP (a gift from L. Petrucelli, 
Mayo Clinic), 1:1,000; and rabbit anti-GR (Proteintech), 1:1,000. The HRP-con- 
jugated secondary antibody (Jackson ImmunoResearch) was used at 1:5,000 
dilution. 

Drug feeding assay. Melt cornmeal-molasses-yeast fly food was mixed with 
certain concentrations of antisense oligonucleotide 573674 ‘CCGGCCCCGG 
CCCCGGCCCC’ (Isis Pharmaceuticals), TMPyP4 (a porphyrin derivative) 
(Sigma), or KPT-276 (Selleckchem) at high temperature and cooled to RT. PBS 
was used as the vehicle control for antisense oligonucleotide and DMSO was used 
as the vehicle control for TMPyP4 and KPT-276. Parent flies were crossed on food 
supplemented with drugs and the offspring were raised on the same food. 
Wandering third instars of the offspring were selected and subjected to GFP 
staining. Antisense oligonucleotides were detected using the anti-antisense oli- 
gonucleotide (13545) antibody (Isis Pharmaceuticals), which detects the MOE 
modification. Adult flies were aged on the drug-containing food for 15 days 
before analysing their eye morphology. 

Statistics. No statistical methods were used to predetermine sample size. For 
quantification of outer eye morphological defects, ten flies were quantified. For 
quantifications of rhabdomere defects, 20 ommatidia from three or four flies were 
quantified for each genotype except RanGAP overexpression. For RanGAP over- 
expression, 24 ommatidia from four flies were quantified. For active zone quan- 
tifications, eight NMJs from four animals are quantified. For NMJ recording, the 
following numbers of animals were used for quantification: 18 for control, 10 for 
G4C-expressing, 6 for RanGAP overexpression. For S2 cell quantifications, ten 
cells were quantified for each genotype. For image quantification of iPSC neu- 
rons, at least 31 neurons per line were quantified for all analyses and each cell line 
was differentiated and analysed at least two times at 55-70 DIV (as indicated) (see 
Supplementary Data Table 4). For qRT-PCR, six biological repeats, each contain- 
ing three technical replicates in parallel, were used for quantification. For salivary 
gland quantifications, eight or nine salivary gland cells from three or four flies 


were quantified for each genotype. The numbers of flies used in the flight assay 
were labelled individually on the bar graph in Extended Data Fig. 1. Error bars are 
presented as s.e.m. For analyses of data sets with two variables, we employed a 
two-tailed Student’s t-test. For analyses of data sets with three or more variables, 
we employed a one-way ANOVA assuming Gaussian distribution with a Tukey’s 
post hoc test for multiple comparisons. To assess correlation between N/C ratios, 
a correlation analysis with Pearson’s coefficient was applied. To obtain r” values, a 
nonlinear regression curve fit assuming one-phase association was performed. A 
P value of < 0.05 was considered statistically significant for all tests (GraphPad 
Prism, Ver. 6.0b). 
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Extended Data Figure 1 | Genetic interaction between G,C, repeats and UAS-RanGAP RNAi/+; (3) GMR-GAL4, UAS-(G4C2)39/+; UAS-RanGEF/+; 
components of the nucleocytoplasmic transport machinery. a, External eye (4) GMR-GAL4, UAS-(G4C2)3o/UAS-RanGEF RNAi; (5) GMR-GAL4, UAS- 
morphology of 1-day-old (left column) and 15-day-old (left column) flies. (G4C2) 3o/UAS-imp-a2; (6) GMR-GAL4, UAS-(G4C2)39/+; UAS-Exportin RNAi/ 
Phalloidin staining of the retina of newly eclosed (middle column, magnifiedin + (BL31353). b, c, Quantification of GyC, mRNA levels by qRT-PCR. d, 
right column) and 15-day-old (middle column, magnified in right column) flies Flight assay. The top of the graduated cylinder is ‘0’, and thus decreased 

is shown. Flies expressing 30 G,C, repeats together with (from top row) landing height represents better flight ability. Genotypes (from left lane): 
RanGAP*”(GOE), RanGAP RNAi, RanGEF overexpression, RanGEF RNAi, (1) and (2) UAS-(G4C2)30/+; elavGS-GAL4/+; (3) UAS-(G4C2)30/+; 
importin-«% overexpression, or exportin RNAi. Genotypes (from top row): (1) elavGS-GAL4/UAS-RanGAP. Number of flies (1) tested indicated in column. 
GMR-GAL4, UAS-(G4C2)39/RanGAP™; (2) GMR-GAL4, UAS-(G4C2)30/+; *P << 0.05; **P<0.01. 
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Extended Data Figure 2 | RanGAP does not rescue developmental defects 
caused by G,C, repeats. a, Staining of the active zone component Bruchpilot 
(Brp) was used to identify active zones in the type Ib NMJ of muscle 4 in 
abdominal segments 3 and 4. b, Quantification of active zone number. 

c, Electrophysiological recording of NMJ in muscle 6/7 of abdominal segments 
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3 and 4. d-g, Evoked junctional potential (EJP) (d), miniature EJP (mEJP) 
amplitude (e), quantal content (f), and mEJP frequencies (g) are shown. 
Genotypes: (1) Ctrl, OK371-GAL4/+; (2) (G4C2)30, OK371-GAL4/+; UAS- 
(G4Cz)30/+3 (3) (GaCz)39 RanGAP OE, OK371-GAL4/+; UAS-(G4C2)30/UAS- 
RanGAP. *P < 0.05; **P < 0.01; ****P < 0.0001. 
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Extended Data Figure 3 | Dot blot of GR and GP dipeptide proteins. Dot 
blot of GR (a) and GP (b) compared with actin control. hs indicates heat-shock 
GALA, and a heat shock was required to induce detectable polyGR as 
described’. A transgenic line UAS-(G4C2)35 previously shown to generate 
polyGR and polyGP DPRs under certain conditions was used as a positive 
control’. 
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Extended Data Figure 4 | RanGAP/RanGAP!1 binds to G,C, repeats. 

a, SDS-PAGE showing purified human RanGAP1. b, EMSA for RanGAP1 
with (CUG)20; (C4G2) 19, or (G4Cz)19 RNA hairpins. c, EMSA for RanGAP1 
with increasing length of repeats that were annealed in the presence of K* 

to promote RNA G-quadruplex formation. d, Plot of the fraction bound from 
the EMSAs performed with RanGAP1 and RNA repeats shown in b and 

c. Similar RNA nucleotide lengths but different binding preferences indicate 
that RanGAP1 has a structure- and sequence-dependent RNA binding mode 
(top panel). All data were fit using a hyperbolic and linear regression, then the 
RanGAP1 binding model determine based on the r’ values for the best fit 


RNA 


(n = 2). The length-dependent binding of RanGAP1 fits best to a hyperbolic 
regression, which demonstrates specific binding to the (GiC2),, G-quadruplex 
conformation, and the fraction bound increases with increasing nucleotide 
length (bottom panel). The fraction bound for the RNA hairpins fit best to a 
linear regression, which indicates nonspecific or less specific binding to 
RanGAP1. The k,/2 values for specific binding of RanGAP1 to the 
G-quadruplex RNA conformation are 162, 39 and 11 nM for (G4C2)4, (G4C2)e.5 
and (G4C3)10; respectively. e, The RanGAP1-(G4C)19 RNA G-quadruplex 


complex is resistant to nonspecific RNA competitors and antisense 
oligonucleotides (n = 1). 
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Extended Data Figure 5 | RanGAP/RanGAP1 is mislocalized in C9-ALS $2 
and iPS cells. a, RanGAP mislocalization with (G4C)39 expression is not 

caused by apoptosis. S2 cells transfected with RanGAP-HA (first column) or 
RanGAP-HA and (G,4C}2)39 (second column) were co-stained with HA (red), 
cleaved Dcp-1 (green) and TO-PRO3 (blue). As a control, $2 cells treated with 
DMSO (third column) or actinomycin (right column) are co-stained with 

cleaved Dcp-1 (green) and TO-PRO3 (blue). b, S2 cells transfected with G4C, 
were co-stained with a Ran antibody (red) and TO-PRO3 (blue). c, Abnormal 
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aggregated RanGAP1 is variably observed in C9-ALS iPSC neurons and is 
largely absent from control iPSC neurons. Arrows indicate abnormal RanGAP1 
staining. d, Single microscopic plane of aggregated RanGAP1 co-localized with 
Nup205 at the nuclear membrane (Lamin B) in C9-ALS iPSC neurons. 
Single immuno-label view in right panels for Nup205 and RanGap1, with x-y 
and x-z projections. e, Cytoplasmic RanGAP1 aggregates can co-localize with 
ubiquitin in C9-ALS iPSC neurons. 
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Extended Data Figure 6 | Electrophysiological and immunocytochemical 
characterization of iPSC neurons and astroglia. a, IR-DIC images of iPSC 
neurons from control (left panel) and C9orf72 (right panel) patient cells 

(a’). Representative action potentials in response to somatic current injections 
(70 pA) in iPSC neurons (b’-d’). The majority of cells from both groups 
displayed either single, adaptive or repetitive responses, as demonstrated 
previously’. These action potentials were blocked by TTX treatment. Normal 
(e’) and C9orf72 (f’) patient cells displayed mEPSCs that were sensitive to 
NBQX treatment, suggesting functional synaptic input. Resting membrane 


ARTICLE 


potential, membrane capacitance, and membrane resistance were comparable 
in both groups (g’-i’). b, Quantification of iPSC neuron markers showing 
glutamatergic and Islet-1* iPSC neurons. c, iPSCs differentiated into neurons 
include phenotypic markers such as Islet-1, HB9, ChAT (choline acetyl 
transferase, motor neuron); Tujl, MAP2, SMI32 (cytoskeletal), VGLUT1 
(vesicular glutamate transporter 1), NMDAR1 (NMDA receptor), and synaptic 
markers SYT1 (synaptotagmin) and SYP (synaptophysin). d, Astroglia markers 
include ALDH1 (universal astroglial marker) and GFAP (reactive astroglia). 
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Extended Data Figure 7 | Additional human RanGAP1 and Nup107 C9-ALS cerebellar cortex molecular layer (ML), Purkinje cells (PK) or granule 
pathology in C9-ALS brain. a, b, C9orf72 motor cortex (b) reveals aberrant _ cell (GL) layer when compared to non C9-ALS control cerebellum. Number in 
nuclear localization of RanGAP1, compared to a non C9 control tissue the upper right of each panel identifies autopsy specimen (Supplementary 
(a), including various nuclear aggregate pathologies seen at higher power in Table 2). f, Nup107 was also aggregated at the nuclear membrane in C9-ALS 
C9orf72 ALS motor cortex (d) as compared to control (c). e, Aberrant motor cortex cells when compared to non C9 control tissues. 


RanGAP!1 nuclear aggregates were not readily observed in 
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Extended Data Figure 8 | C9orf72 HRE disrupts the cytoplasmic/nuclear 
Ran gradient. a, Representative images of disrupted N/C Ran gradient in C9- 
ALS ChAT* iPSC neurons. b, c, Representative images and quantification of 
control (top row) or C9-ALS iPSC neurons (bottom row) expressing Ran-GFP 
that are co-stained with Ran and MAP2. Both Ran antibody and Ran—GFP 
indicate a reduced N/C Ran ratio. d, Overexpression of RanGAP1-GFP rescues 
the N/C Ran ratio in C9-ALS iPSC neurons. e, Control iPSC neurons 

treated with tunicamycin show enhanced level of activated Caspase 3 in the 
soma but no change in N/C Ran localization compared to controls with 
vehicle treatment. f, RanGAP1 is not aggregated in control and C9-ALS iPSC 
astroglia. g, Representative image of N/C Ran in C9-ALS astrocytes when 
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identified using the pan astroglial ALDH1 marker. h, N/C Ran is not altered in 
C9-ALS astroglia when comparing astrocytes of a similar size. i, Mean intensity 
fluorescence (MIF) of nuclear Ran does not differ in control or C9-ALS 
astroglia. j, Representative image of C9-ALS iPSC neuron with GyC, RNA 
foci in approximately 40% of MAP2* neurons at 50-70 DIV. Number of C9- 
ALS iPSC neurons with RNA foci is reduced with C9orf72 RNA targeting 
antisense oligonucleotides compared to scrambled/non-targeting antisense 
oligonucleotides to <10% of iPSC neurons. k, Antisense oligonucleotides that 
reduce GyC, RNA foci also enhance N/C Ran and N/C TDP-43 ratios. 

*P < 0.05; **P < 0.01; ****P < 0.0001. 
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Extended Data Figure 9 | C9orf72 HRE causes nucleocytoplasmic transport (blue) (top row). The GFP signal is shown separately in the bottom row. 


defects. a, Quantification of the nuclear GFP intensity in Fig. 4a. Genotypes (from left): (1) OK371-GAL4/UAS-NLS-NES-GFP (II); (2) OK371- 
b, Immunoblot of the GFP levels in Fig. 4a. c, Quantification of the TBPH N/C GAL4/UAS-NLS-NES-GFP; UAS-(G4C2)30/+; (3) OK371-GAL4/+; UAS-NLS- 
ratio in Fig. 4a. d, Wild-type control and (G4C)39-expressing motor NES(P12)-GFP/+; (4) OK371-GAL4/+; UAS-NLS-NES(P12)-GFP/UAS- 


neurons expressing NLS-NES-GFP (left two columns) or NLS-ANES-GFP (G4C2)30. 
(right two columns) co-stained with a GFP antibody (green) and TO-PRO3 
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Extended Data Figure 10 | Model of C9orf72 mutation induced 
nucleocytoplasmic transport disruption. a, In normal cases, RanGAP1 is 
tethered onto the NPC via RanBP2, where it activates RaneGTP hydrolysis to 
produce RaneGDP. RaneGDP dissociates from and activates the Importin-«B 
complex to import NLS-NES-containing protein cargos such as TDP-43. b, 
In the nucleus, RanGEF converts RaneGDP to RaneGTP that is required for 
the dissociation of the NLS—Importin-«B complex and the export of NES 
protein cargoes. c, In C9-ALS, G4C2 HRE binds and sequesters RanGAP1, 
leading to an increase in cytoplasmic RaneGTP. High cytoplasmic RaneGTP 
prevents the formation of the NLS-Importin-«f complex, thereby disrupting 
the N/C Ran gradient and impairing nuclear import of NLS-containing 
proteins. d, Dipeptide repeat proteins translated from the GyC, RNA can be 
toxic when expressed at high levels but it is unclear whether they contribute 
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to nucleocytoplasmic trafficking deficits in Drosophila since they are not 
detected at the time of degeneration. The C90rf72 HRE sense strand appears to 
be contributing to nucleocytoplasmic trafficking deficits in human iPSC 
neurons and fly model systems, as small molecules and antisense 
oligonucleotides targeting the sense RNA substantially suppress the nuclear 
import phenotypes and neurodegeneration as a result of the GC, repeat RNA 
expression. Overall, the data are most consistent with an RNA-mediated 
mechanism with evidence that includes: (1) RanGAP1 was identified as 1 of 
19 sequence-specific interactors of GyC, RNA; (2) RanGAP is a strong 
genetic modifier of G{C, RNA-mediated degeneration in Drosophila under 
conditions in which polyGR and polyGP are not detected; (3) RanGAP 
directly and potently interacts with HRE RNA; and (4) GyC, RNA foci can 
co-localize with RanGAP1. 
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Architecture of the synaptotagmin- 
SNARE machinery for neuronal 


exocytosis 


Qiangjun Zhou'’, Ying Lai!’**, Taulant Bacaj'*, Minglei Zhao’**, Artem Y. Lyubimov'?, Monarin Uervirojnangkoorn’”, 
Oliver B. Zeldin!?, Aaron S. Brewster*, Nicholas K. Sauter’, Aina E. Cohen’, S. Michael Soltis’, Roberto Alonso-Mori’, 
Matthieu Chollet*, Henrik T. Lemke’, Richard A. Pfuetzner’?, Ucheor B. Choi”, William I. Weis°, Jiajie Diao'?, 


Thomas C. Siidhof' & Axel T. Brunger!* 


Synaptotagmin-1 and neuronal SNARE proteins have central roles in evoked synchronous neurotransmitter release; 
however, it is unknown how they cooperate to trigger synaptic vesicle fusion. Here we report atomic-resolution crystal 
structures of Ca**- and Mg**-bound complexes between synaptotagmin-1 and the neuronal SNARE complex, one of 
which was determined with diffraction data from an X-ray free-electron laser, leading to an atomic-resolution structure 
with accurate rotamer assignments for many side chains. The structures reveal several interfaces, including a large, 
specific, Ca**-independent and conserved interface. Tests of this interface by mutagenesis suggest that it is essential 
for Ca?* -triggered neurotransmitter release in mouse hippocampal neuronal synapses and for Ca** -triggered vesicle 
fusion in a reconstituted system. We propose that this interface forms before Ca?* triggering, moves en bloc as Ca?* 
influx promotes the interactions between synaptotagmin-1 and the plasma membrane, and consequently remodels the 
membrane to promote fusion, possibly in conjunction with other interfaces. 


Membrane fusion is essential for many physiological processes in 
eukaryotic cells, including protein and membrane trafficking, hor- 
mone secretion and neurotransmitter release’”. Evolutionarily con- 
served SNARE (soluble N-ethylmaleimide sensitive factor attachment 
protein receptor) proteins have a key role in these processes. Specific 
combinations of SNARE proteins are located on opposite membranes. 
Upon zippering into a highly stable four-helix bundle—the SNARE 
complex—they provide the energy for membrane fusion**. However, 
other factors are essential for regulation of membrane fusion. In par- 
ticular, several proteins are required for neurotransmitter release in 
addition to neuronal SNAREs’, but it is unknown, at the atomic level, 
how these factors cooperate with SNAREs to promote synaptic trans- 
mission. One key factor is the Ca”* sensor synaptotagmin, which 
consists of a short N-terminal luminal segment, a single transmem- 
brane o-helix, an unstructured linker, and two Ca**-binding C2 
domains, termed C2A and C2B, respectively (or C2AB together)°. 
There are 16 isoforms of mammalian synaptotagmins that are loca- 
lized to synaptic and secretory vesicles or the plasma membrane. 
Among these isoforms, synaptotagmin-1 (Syt1) is a Ca” sensor for 
evoked synchronous neurotransmitter release’. Synaptotagmin-2 and 
synaptotagmin-9 are also involved in evoked synchronous neuro- 
transmitter release for different subsets of neurons*. In contrast, syna- 
potagmin-7 plays a part in ‘slower’ asynchronous release”’®; 
moreover, these and other synaptotagmins act in other types of exo- 
cytosis’. In addition to its role in evoked synchronous release, Syt1 also 
clamps the frequency of miniature spontaneous events’*”’. 

Sytl binds in a Ca’* -dependent manner to anionic membranes; 
during binding, anionic phospholipids and synaptotagmin C2 


domains together coordinate calcium ions'*"’. The membrane- 


synaptotagmin interaction has functional significance since the 
Ca** affinity of Sytl for binding to anionic membranes and the 
Ca*™ sensitivity of neurotransmitter release are tightly correlated'*"*. 
The Syt1 C2AB fragment can induce vesicle clustering’’ and preferen- 
tially binds to curved membranes””’. Moreover, C2 domains may 
penetrate the membrane upon Ca*~ binding”. 

Syt1 also interacts with the neuronal SNARE complex based on 
immunoprecipitation and pull-down experiments*”, single mole- 
cule fluorescence resonance energy transfer (smFRET)**, and nuclear 
magnetic resonance” experiments. A gain-of-function mutation in 
the Ca** -binding region of the C2A domain suggested that the Syt1- 
SNARE interaction may be functionally important”, but the molecu- 
lar basis and the significance of the interaction between Syt1 and the 
SNARE complex remain unknown. 

Several crystal structures of Syt] C2A and C2B domains, and C2AB 
fragments, are available*’~*’, as well as the structure of the neuronal 
SNARE complex’; however, the atomic-resolution structure of the 
complex between Sytl and the neuronal SNARE complex (referred 
to as Syt1-SNARE complex) has been elusive. Single molecule meth- 
ods allowed the study of the Sytl-SNARE complex under dilute 
conditions with spatially isolated neuronal SNARE complexes recon- 
stituted in a supported bilayer**. The observed smFRET histograms” 
suggested several possible interfaces between Sytl and the SNARE 
complex. Other dynamic or approximate models of the C2AB- 
SNARE complex were obtained by nuclear magnetic resonance 
(NMR)”**°, but cannot be readily compared with the previous 
smFRET studies” or the results presented here because of differences 
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in conditions, particular covalent attachment of lanthanide labels”®, 
and lack of atomic resolution. 

Here we report atomic-resolution crystal structures of a Sytl- 
SNARE complex in two different crystal forms and in the presence 
of either Ca’ or Mg”*. We found several interfaces, includin nga large 
structurally and evolutionarily conserved interface that is Ca~" inde- 
pendent. Structure-based mutations of this interface disrupt evoked 
neurotransmitter release in primary neurons and Ca** -triggered 
fusion in a reconstituted system. 


Structure of the Sytl-SNARE complex 


We designed and tested several chimaeric constructs involving the 
Syt1 C2AB fragment (amino acids 141-421) and the neuronal SNARE 
complex (Extended Data Fig. 1a, b and Methods). We crystallized the 
Ca**- and Mg’*-bound Sytl-SNARE complexes (Extended Data 
Fig. 1d, e), and determined their structures (Fig. 1 and Extended 
Data Table 1). The crystallization conditions were at near-physiological 
pH and ionic strength (Methods and Extended Data Fig. le). We 
observed two crystal forms for the Ca” -bound Sytl-SNARE complex, 
referred to as ‘short unit cell’ and ‘long unit cell’ crystal forms hereafter 
(Extended Data Figs 2 and 3). 

The X-ray free-electron laser (XFEL) of the Linac Coherent Light 
Source (LCLS) at SLAC National Accelerator Laboratory yielded sub- 
stantially higher-quality diffraction data than the Advanced Photon 
Source (APS) NE-CAT microfocus synchrotron beamline at Argonne 
National Laboratory from similar crystals of the long unit cell crystal 
form (Extended Data Fig. 2a, b). The electron density maps obtained 
from the XFEL diffraction data were notably superior to those of the 
synchrotron data sets (Methods). In particular, the electron density 
maps calculated from the XFEL data set were of sufficient quality to 
obtain accurate rotamers for most side chains, including those at the 
interfaces between molecules (Extended Data Fig. 2d-f). This is one of 
the first new crystal structures determined using XFEL diffraction 
data. Moreover, in contrast to the tens of thousands to millions of 
crystals typically used in XFEL-based crystallography experiments”, 


a : 
C2A-C2B interface (465 A*) 


Sytt C2B (1) \F sytt C2A ("’) 


SNAP-25_C 


\. Primary interface (720 A?) 


Syt1 C2B (\) 


[c... ee | 


Secondary interface 


Figure 1 | Crystal structure of the Syt1-SNARE complex. a, Structure of the 
Ca’*-bound Syt1-SNARE complex (only showing complex I) in the long 
unit cell crystal form (Extended Data Table 1 and Extended Data Fig. 2). Two 
Syt1 C2B domains (designated as I and I’) and one Syt1 C2A domain (related 
by crystallographic symmetry and coloured in grey) form a total of three 
interfaces (primary, secondary and tertiary) with the neuronal SNARE complex 
(synaptobrevin-2, syntaxin-1A and the two SNARE domains of SNAP-25A 
(SNAP-25_N and SNAP-25_C)). A fourth interface (C2A—C2B interface) is 
located between Syt1 C2B (I’) and Syt1 C2A (I’). Interface areas are provided 
in parentheses. b, Close-up views of the four interfaces with labels for side 
chains of interacting residues. The left panel shows a superposition of both 
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we obtained a reasonably complete data set from a few hundred 
images captured from 72 of the 148 crystals exposed to the LCLS 
XFEL beam (Methods). 


Three distinct Sytl-SNARE interfaces 


The crystal structures of the Syt1-SNARE complex reveal three inter- 
faces between the SNARE complex and the Sytl C2A and C2B 
domains (referred to as ‘primary’, ‘secondary’ and ‘tertiary’), as well 
as an interface between Sytl C2 domains (referred to as C2A-C2B 
interface; Fig. 1a, b and Supplementary Videos 1 and 2). There are two 
essentially identical instances of the primary interface formed 
between Syt1 C2B domains and the SNARE complex in the long unit 
cell crystal form (Fig. 1c, d). The secondary interface involves another 
Syt1 C2B domain and the SNARE complex, while the tertiary inter- 
face involves a Sytl C2A domain and the SNARE complex (Fig. 1b). 
These two C2 domains also form the C2A—C2B interface. All three 
interfaces between Syt1 and the SNARE complex fall within the range 
of smFRET efficiency histograms obtained previously”* (Extended 
Data Fig. 4). These interfaces may suggest how multiple Sytl and 
SNARE complexes simultaneously interact in the neuron (see below). 

The largest, primary interface between the Sytl C2B domain and 
the SNARE complex is very similar in both complexes (Fig. 1b) and in 
both crystal forms (Extended Data Fig. 5a), suggesting that it is not 
affected by crystal packing. The primary interface is also very similar 
in the Ca**-bound as well as in the Mgr" -bound crystal structures 
(Extended Data Fig. 5d), implying a Ca* * -independent interface. The 
residues involved in the primary interface have relatively low temper- 
ature factors, among the lowest in these structures (Extended Data 
Fig. 5e), and the electron densities of the side chains that form this 
interface are well defined (Fig. 2c, d), suggesting a specific interaction. 
Ina recent study, lanthanide labels were covalently attached to SNAP- 
25 residues 41 and 166 for pseudo-contact chemical shift NMR mea- 
surements between Sytl and the SNARE complex in the presence of 
125 mM thiocyanate”; these covalent labels would probably disrupt 
the primary interface (Fig. 2c), so it is not possible to compare our 
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primary interfaces that occur in the long unit cell crystal form (root-mean- 
square difference (r.m.s.d.) = 0.34 A, including Cx atoms of the SNARE 
complex and the Syt1 C2B domain forming the interface). The middle panel 
shows the secondary interface. The right panel shows both the tertiary interface 
and the C2A-C2B interface. c, Rotated view of panel a, but showing the 
entire asymmetric unit and the symmetry-related Syt] C2AB fragment. Three 
Syt1 C2AB fragments (designated as I, I’ and II) bind to two SNARE complexes 
in the asymmetric unit. SNARE (II) only interacts with the C2B domain of 
one Sytl C2AB fragment, Syt1 (II), via the same primary interface as observed 
in complex I. d, A schema corresponding to the structure shown in panel c. 
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SNARE complex. a, Overview of the primary interface (complex I in the 
long unit cell crystal form) along with interacting residues (stick and ball 
representation). b, Open-book view of the electrostatic potential map of the 
primary interface. The two polar regions I and II are connected by a 
hydrophobic patch (SNAP-25 144, L47 and V48 and Syt1 V292, L294 and 
A402). c, d, Close-up views of regions I and IL. Interacting residues are 
labelled, along with dashed lines that indicate hydrogen bonds or salt bridges. 
2mF, — DF, electron density maps of the interacting residues are superimposed 
(grey mesh; contour level = 1.50). 


crystal structures with this NMR study. The interacting residues of the 
primary interface are conserved across different species for synapto- 
tagmins (Syt1, Syt2, Syt9; ref. 8), SNAP-25 and syntaxin-1A homo- 
logues involved in fast synchronous release (Extended Data Fig. 6). In 
contrast, the interacting residues show variation among other synap- 
totagmin, SNAP-25 and syntaxin homologues that are not known to 
be involved in fast synchronous release. 

The C2B domain forming the secondary interface is slightly rotated 
with respect to the SNARE complex between the two crystal forms 
and between the Ca*" - and Mg”* -bound crystal structures (Extended 
Data Fig. 5b, d), although the interactions at the secondary interface 
itself are similar. The C2A domain that forms the tertiary interface is 
in the same orientation in both crystal forms and in the Ca7*- and 
Mg’*-bound crystal structures (Extended Data Fig. 5c, d), and the 
interacting side chains are in similar positions. The interactions that 
form the tertiary interface are primarily ionic, involving residues Syt1 
R199 and R233, syntaxin-1A D218, and synaptobrevin-2 D57 
(Fig. 1b) with well-resolved electron density. 


The conserved primary interface 
We divide the primary interface into two regions (Fig. 2) dominated 
by polar interactions: region I comprises SNAP-25 residues E37, K40, 
N159, M163 and D166, and Sytl residues E295, K297, N336 and 
Y338; region II comprises SNAP-25 residues D51, E52 and E55, syn- 
taxin-1A residues D231, E234 and E238, and Sytl residues R281, 
K288, R398 and R399. The SNARE complex is positively charged 
and Syt1 is negatively charged in region I, whereas the opposite is 
observed for region II. In contrast, other synaptotagmin isoforms not 
involved in fast synchronous release exhibit sequence variations and 
differences in the electrostatic potential maps in both regions 
(Extended Data Fig. 6a, b), suggesting that the primary interface 
may be an interaction that is specific for fast synchronous release. 

A subset of the interacting residues of region II were suggested to be 
functionally important in previous studies: Syt] R398/R399 (refs 35, 
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36) and SNAP-25 D51/E52/E55 (ref. 27). However, the role of these 
residues was unclear in the absence of structural information, which 
now reveals that they are part of a larger interface between Sytl and 
the SNARE complex. Moreover, interactions in region I (Fig. 2c) have 
not been implicated in any previous studies. 


The primary interface is critical 


We designed mutations of the critical interacting residues of the 
primary interface based on the crystal structure and verified that all 
mutants result in properly folded Syt1 and SNARE complex (Methods 
and Extended Data Fig. 7). To test the interactions in neurons, we 
performed co-immunoprecipitation of syntaxin-1A in cultured Sytl 
conditional knockout neurons infected with viruses expressing wild- 
type Sytl or Syt1 containing the designed C2B mutants (Fig. 3a and 
Methods). The region II Sytl mutant (R398Q/R399Q) has been 
reported previously****, and is used here for comparison. Syntaxin- 
1A was immunoprecipitated from lysates of these neurons, and the 
presence of co-immunoprecipitated proteins was assayed with mono- 
clonal antibodies against Sytl and synaptobrevin-2. Mutation of 
either regions I and II of the primary interface reduced Syt1 binding 
to the SNARE complex by ~50%, and simultaneous mutation of both 
regions (referred to as ‘Syt1 quintuple’) reduced binding to ~33% as 
compared to a Syt1 wild-type construct introduced in the same man- 
ner (Fig. 3b, c). These results suggest that both regions are required for 
efficient Syt1-SNARE binding. 

For further investigation of the function of the primary interface 
between the neuronal SNARE complex and Syt1, we used a single- 
vesicle content-mixing assay*’** and tested the effect of both Sytl 
and SNAP-25 mutants on association, spontaneous fusion and 
Ca’* -triggered fusion of single vesicles with reconstituted full-length 
neuronal SNAREs, Sytl and complexin-1 (Fig. 3d-g, Extended Data 
Fig. 8 and Extended Data Table 2). The Syt1 mutants disrupting the 
interface between the SNARE complex and Syt1 as seen by co-immu- 
noprecipitation (Fig. 3b, c) also reduced vesicle association to similar 
degrees for the four different Sytl mutants (left group of mutants in 
Fig. 3d). Our reconstituted assay also allowed testing of the interacting 
SNAP-25 residues (middle group of mutants in Fig. 3d), showing 
significant reduction of vesicle association as well. Thus, as expected, 
mutations on both the Sytl and SNARE sides of the primary interface 
reduce the interaction between them. Likewise, both groups of mutants 
reduce the amplitude and decrease the synchronization of Ca**- 
triggered fusion, but they do not affect spontaneous fusion (Fig. 3e-g). 
In particular, the Syt1 quintuple mutant significantly reduced Ca?” - 
triggered synchronization to the control level without Syt1. 

To characterize the interaction between Syt1 and the SNARE com- 
plex further, we investigated the effect of ATP, which has an ionic 
shielding effect on certain cellular processes*”*°. We observed no 
effect on both spontaneous and Ca**-triggered fusion, and only a 
mild effect on vesicle association (Fig. 3d-g, right control group), 
suggesting that the functionally important interactions between 
Sytl1 and the SNARE complex are not affected by ionic shielding. 


Evoked release requires primary interface 


We assayed release electrophysiologically in Syt1 conditional knock- 
out neurons in which endogenous Syt1 was replaced with mutant Syt1 
(Fig. 3a and Methods). As expected’, removal of Sytl abolished 
synchronous release, as monitored by recording evoked inhibitory 
postsynaptic currents (eIPSCs), a phenotype that can be rescued by 
re-introduction of wild-type but not mutant Syt1 cDNA (Fig. 4a, b). 
Notably, the region I mutants retained some rescue ability, in contrast 
to region II mutants that were nearly non-functional (Fig. 4a, b), 
including the R398Q/R399Q mutant as previously reported**. The 
triple mutant (R281A/R398A/R399A) and the Syt1 quintuple mutant 
that combines mutations in both regions showed even more severe 
phenotypes. In contrast, a mutant that affects the polybasic region 
of Sytl (R322E/K325E) exhibited a milder phenotype, as reported 
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Figure 3 | Mutations of the primary interface affect binding and Ca’*- 
triggered single vesicle-vesicle fusion. “Syt1 quintuple’ refers to the Syt1 
mutant (R281 A/E295A/Y338W/R398A/R399A). ‘SNAP-25 quintuple’ refers to 
the SNAP-25 mutant (K40A/D51A/E52A/E55A/D166A). The colour code is 
specified in the figure. a, Schematic diagram of Syt1 conditional knockout (cKO) 
mice. The Syt1 exon 2 which contains the transmembrane domain is floxed. 
Cre recombinase removes exon 2, ablating all cytoplasmic Syt1 sequences. b, Co- 
immunoprecipitation (Co-IP) of either Syt1 (top row) or synaptobrevin-2 
(bottom row) with a syntaxin-1A antibody in Syt1 conditional knockout 
cultured neurons rescued with the indicated Syt1 mutant constructs. c, Quanti- 
fication of co-immunoprecipitation of Sytl normalized to synaptobrevin-2. 
Results are scaled to Syt1 wild-type levels. All data are means + s.e.m,; statistical 
significance was analysed by the Student’s t-test comparing the mutants with 
wild-type Sytl; **P < 0.01, n = 4 for Sytl E295A/Y338W and Syt1 quintuple; 
NS, no significant difference, n = 3 for Syt] R398Q/R399Q; *P < 0.05, n = 4 for 
Syt1 R281A/R398A/R399A. d-g, Bar graphs showing the effects of Sytl and 
SNAP-25 mutants in fusion of single vesicles with reconstituted neuronal 
SNAREs, Syt1 and complexin-1 (see Methods and ref. 38). d, Number of 
associated SV vesicles (see Methods) after incubation of SV vesicles with surface- 
immobilized PM vesicles (see Methods) for a 1-min period. e, Number of 
spontaneous fusion events over the subsequent 1-min observation period nor- 
malized by the number of associated SV vesicles. f, Synchronization, that is, 
decay rates (1/t), of the histograms of fusion events upon 500 1M Ca** injection. 
Error bars are error estimates computed from the covariance matrix upon fitting 
the corresponding cumulative histograms with a single exponential decay 
function using a Levenberg-Maquardt technique. g, Amplitude of the first 1-s 
time bin upon Ca”* injection. Each value in this panel was normalized by the 
respective number of fusion events after Ca?* injection. d, e, g, All data are 
means ~ s.e.m.; the number of independent repeat experiments are depicted 
above the bars and in Extended Data Table 2; statistical significance was assessed 
by the Student’s t-test comparing all other conditions with wild type (**P < 0.005; 
*P < 0.05). The cumulative fusion histograms are shown in Extended Data Fig. 8. 
Controls in panels d-g are in the presence of 3 mM ATP and in the absence 
of SNAP-25 and Sytl. As expected, Ca~*-triggered fusion required the pres- 
ence of both SNAP-25 and Syt1; fusion is not affected by the presence of ATP. 
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Figure 4 | Mutations of the primary interface impair Syt1 function in Ca**- 
triggered release. Recording of inhibitory postsynaptic currents (IPSCs) from 
cultured Syt1 conditional knockout hippocampal neurons infected with 
lentiviruses expressing Cre recombinase and Syt1 mutants of the primary 
interface and a mutant of the polybasic region (R322E/K325E, ref. 30). All 
recordings were performed in the presence of 6-cyano-7-nitroquinoxaline-2,3- 
dione (CNQX, 20 11M) and D-2-amino-5-phosphonovalerate (AP-5, 50 11M) 
using a high Cl internal solution. a, b, Sample traces of evoked IPSCs from 
single action potentials (a) and quantification of peak amplitudes (b). Tick 
marks indicate stimulus delivery. All data are means + s.e.m.; number of cells/ 
independent cultures analysed are depicted above the bars; statistical 
significance was assessed by one-way analysis of variance comparing all other 
conditions with wild-type rescue group (***P < 0.001). c-e, Sytl mutants 
display facilitation, instead of depression, during high-frequency stimulation. 
c, Sample traces of 10 Hz trains; d, e, quantification of absolute (d) and 
normalized (e) IPSC amplitudes during the train; numbers of cells/independent 
cultures analysed are depicted in parentheses in the labels for each of the traces. 
AP, action potential. f, g, Syt] mutants are unable to clamp the frequency of 
spontaneous IPSCs (sIPSCs). f, Sample spontaneous IPSC traces; 

g, quantification of event frequency (left) and amplitude (right). All data are 
means + s.e.m.; the number of cells/independent cultures analysed are depicted 
above the bars. Statistical significance was assessed by one-way analysis of 
variance comparing all other conditions with the wild-type rescue group 
(***P < 0.001; NS, no significant difference). 


previously*®. During high-frequency stimulation, wild-type cultured 
hippocampal neurons displayed depression while Syt1 conditional 
knockout neurons showed asynchronous release with robust facilita- 
tion (Fig. 4c-e). All Sytl mutants underwent facilitation, and the 
severity of the phenotype for each mutant correlated well with the 
results for single evoked release. Another known consequence of Syt1 
removal is the unclamping of spontaneous release. Interestingly, the 
E295A/Y338W mutant could rescue this phenotype while the region 
II mutants (or combinations thereof) could not (Fig. 4f, g). There were 
no differences in spontaneous miniature IPSC amplitudes between 
wild type and mutant rescues (Fig. 4g). 

Together, these observations support the notion that the primary 
Syt1-SNARE interface is critical for the role of Sytl as a Ca~" sensor 
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Figure 5 | Model of the role of the primary Sytl-SNARE interface. a,b, Unit 
that is formed by the primary interface between Sytl C2B and the SNARE 
complex. a, Cartoon representation with positively charged side chains shown 
as sticks; b, electrostatic potential map looking towards the positively charged 
face of the Sytl C2B-SNARE unit. c-e, Proposed function of the Syt1 C2B- 
SNARE unit. c, Initial state before Ca* triggering. The juxtamembrane linkers 
of synaptobrevin-2 and of syntaxin-1A were modelled as random coils”. 

d, Intermediate state after Ca** triggering when the membranes are close 
enough to promote stalk formation”. e, Fusion pore formation. Zigzag lines 
indicate palmitoylated cysteine residues of the SNAP-25 linker region. PM, 
plasma membrane; VM, vesicle membrane. f, Other interfaces found in the 
crystal structure, along with the primary interface, could form a connected 
network of SNARE complexes that surrounds the point of contact between 
membranes. The left panel is a top-down view onto the point of contact 
between membranes; the right panel is a rotated projection view. 


of evoked release, but that this Syt1-SNARE interface is not required 
for spontaneous release in neurons, consistent with the differential 
effect of Sytl mutations on spontaneous and evoked release'’"*. We 
note that spontaneous release observed in our reconstituted system* 
reflects the fusion probability at exactly zero Ca** concentration (that 
is, in the absence of a Ca** sensor for mini-release), whereas spon- 
taneous release in neurons is driven by resting Ca** concentrations 
and is largely blocked by removing all calcium“. This circumstance 
may explain the different behaviour for the E295A/Y338W mutant in 
neuronal cultures and in the reconstituted system. Moreover, there is 
little effect on synchronization of the R398Q/R399Q mutant of Syt1 in 
the reconstituted system, which may indicate the existence of addi- 
tional interactions involving these residues with other factors not 
present in our minimal reconstituted system. 


Model of Sytl-mediated Ca’* triggering 

Several mechanisms and models have been proposed for how Syt1 
and the neuronal SNARE complex cooperate”’*”*’. These models, 
however, were devised based on biochemical data without atomic- 
resolution structural information. Our crystal structures of the Syt1- 
SNARE complex now suggest a parsimonious mechanism that 
involves a concerted action of the unit consisting of the Sytl C2B 
domain and the SNARE complex (Fig. 5a). We first focus on this unit 
alone and then later place it in the context of full-length Syt1 and other 
interactions. The combined surface of the Sytl C2B-SNARE unit 
forms a flat face with an extensive pattern of positive charges 
(Fig. 5b) that includes the polybasic region of Syt1 as well as basic 
residues at the membrane-proximal side of the neuronal SNARE 
complex. The effect of the Syt1 R322E/K325E mutant (Fig. 4) as well 
as reported observations””*°***** suggests a functional role for the 
polybasic region. 
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Before Ca?* influx, we assume a membrane juxtaposed, hemifu- 
sion-free state (Fig. 5c), the likely starting state for fast Ca** -triggered 
fusion’’, and a partially folded trans-SNARE complex**. We assume 
that the Sytl C2B-SNARE unit can assemble in this state since the 
primary interface involves residues within the folded region of the 
trans-SNARE complex. The palmitoylated cysteine residues of SNAP- 
25 would be able to interact with the plasma membrane, and the 
membrane-proximal side of syntaxin is closely juxtaposed with the 
membrane, as inferred by the requirement of tight membrane coup- 
ling of syntaxin for evoked release”. Upon Ca** binding to the Syt1 
C2B domain, the Ca** binding loops partially insert into the mem- 
brane” with a preference for membrane curvature”®”’, possibly sup- 
ported by ionic interactions via the polybasic region and with 
continuing maintenance of the Sytl-SNARE interface. We propose 
that the Syt1 C2B-SNARE unit moves en bloc as an entity upon Ca** 
triggering. The simultaneous membrane interactions of the Ca”* 
binding loops, of the polybasic region, and of the membrane-proximal 
region of the SNARE complex would therefore require a deformation 
of the plasma membrane. This morphological change of the plasma 
membrane juxtaposes the membranes closer than the critical distance 
(0.9 nm) to promote stalk formation** (Fig. 5d), and subsequently 
leads to fusion pore opening (Fig. 5e). 

It is likely that Syt] engages in multiple functionally relevant inter- 
actions with neuronal SNARE complexes, including the other interfaces 
found in our crystal structures. All of these interactions may be 
anchored by the primary interface, which is the most stable and extens- 
ive contact site. We speculate that multiple SNARE complexes could 
interact via Sytl interactions employing some or all of the interfaces 
(primary, secondary, tertiary, C2A—C2B) that are observed in our crys- 
tal structures (Fig. 5f). In addition to the primary and secondary inter- 
faces formed by C2B domains (Fig. 5f, gold), the tertiary interface 
involves a C2A domain (Fig. 5f, grey). This interface could be involved 
in displacing complexin from its interaction with the core of the SNARE 
complex, since complexin binds in the groove between syntaxin-1A and 
synaptobrevin-2 (ref. 49); partial displacement of complexin may be 
important for neurotransmitter release”. Interestingly, the gain-of- 
function mutation D232N in the Ca**-binding region of the C2A 
domain”® is close to the tertiary interface and it is also part of the 
C2A-C2B interface (Fig. 1b). In the proposed network of interactions 
there are C2A domains (Fig. 5f, purple) that have no interactions with 
SNARE complexes; we envision that they could interact with a mem- 
brane. This entire Syt1-SNARE assembly would thus be poised to con- 
fer cooperativity upon Ca”* triggering on a sub-millisecond timescale. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Strategy for crystallization of the Sytl-SNARE complex. Successful crystalliza- 
tion of the Syt1-SNARE complex involved extensive testing of multiple designs of 
covalently linked chimaeras between a Syt1 C2AB fragment and different SNARE 
domain fragments of the neuronal SNARE complex (both SNARE domains of 
SNAP-25, denoted SNAP-25_N and SNAP-25_C, as well as the SNARE domain 
of syntaxin-1A). Three different linker lengths for the chimaeras were tested (16, 
23 and 37 amino acids) derived from the linker sequence of the human Oct-1 
transcription factor*’. This particular linker sequence had been used previously to 
crystallize the Arf-ArfGAP complex”. We also tested different truncations of the 
neuronal SNARE complex*”’. The resulting constructs were screened for protein 
expression and homogeneity by ion exchange and size-exclusion chromato- 
graphy. The best constructs resulted in a Syt1-SNARE complex that eluted in a 
mono-disperse peak in the final size-exclusion chromatography step, and that 
remained stable in SDS-PAGE without boiling, a hallmark for neuronal SNARE 
complex formation (Extended Data Fig. 1b). Best results were obtained when the 
SNARE fragments were truncated as previously described”. The best candidates 
were used for crystallization trials (see below). 

Cloning, expression and purification of the Sytl-SNARE complex. The SNAP- 
25 isoform used throughout this study is commonly referred to as isoform 2 or 
SNAP-25A. The C2AB fragment of rat synaptotagmin-1 (amino acid range 141- 
421) was fused to the amino terminus of the C-terminal SNARE domain of rat 
SNAP-25 (SNAP-25_C, amino acid range 141-204) via a 37-amino-acid linker 
(sequence NLSSDSSLSSPSALNSLSSPSALNSTASNSPGIEGLS) derived from the 
human Oct-1 transcription factor*' (Extended Data Fig. 1a) (referred to as C2AB- 
linker-SNAP-25_C). 

The C2AB-linker-SNAP-25_C chimaera, the rat SNAP-25_N fragment 
(amino acid range 7-83), the rat syntaxin-1A fragment (amino acid range 191- 
256), and the His-tagged rat synaptobrevin-2 fragment (amino acid range 28-89) 
were cloned into the Duet expression system (Novagen) following previous work 
with the neuronal SNARE complex™ (Extended Data Fig. 1a). These four protein 
constructs were co-expressed in Escherichia coli, leading to complex formation in 
the host (referred to as Sytl-SNARE*”**""" complex). Specifically, E. coli 
BL21(DE3) cells were grown overnight at 30 °C using auto-inducing LB med- 
ium**. After harvesting the cells by centrifugation, the pellet was re-suspended 
in lysis buffer (50mM Tris-HCl, pH8.0, 300mMNaCl, 20 mM imidazole, 
l1mMCaCl,, 0.5mMTCEP and EDTA-free protease inhibitor cocktail 
(Roche)) and lysed by three passes through the Emulsiflex C5 homogenizer 
(Avestin) at 15,000 p.s.i. After centrifugation, the cleared lysate was bound to a 
4-ml bed volume of Ni-NTA beads (Qiagen) equilibrated in the lysis buffer. Beads 
were harvested by centrifugation and poured into a column, washed with the lysis 
buffer, and subsequently washed with the lysis buffer supplemented with addi- 
tional 40 mM imidazole. The Sytl-SNARE*”**""** complex was eluted with the 
lysis buffer supplemented with additional 350 mM imidazole. 

Depending on purification of Ca**-free or Ca**-bound complex, EDTA or 
CaCl, was included at specified steps. The fresh eluent of the Ni-NTA-affinity 
purified Sytl-SNARE***""** complex was pooled and dialysed against 
dialysis buffer I (50 mM Tris-HCl, pH 8.0, 150mM NaCl, 0.5 mM TCEP; with 
1mM CaCl or 10 mM EDTA) for 3 to 4h at 4 °C. The dialysate was supplemen- 
ted with TEV protease, and further dialysed in dialysis buffer II (50 mM Tris-HCl, 
pH 8.0, 50 mM NaCl, 0.5 mM TCEP; with 1 mM CaCl, or none) overnight at 4 °C. 
After removal of uncleaved sample, the His-tag-cleaved complex was subjected 
to anion exchange chromatography (buffer A: 50 mM Tris-HCl, pH 8.0, 
50 mM NaCl, 0.5 mM TCEP, buffer B: 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 
0.5mM TCEP; both buffer A and B were supplemented with 1mM CaCl, or 
2mM EDTA) using a linear gradient of NaCl starting at 50 mM and ending at 
250mM. The peak fractions were pooled, concentrated and loaded onto a 
Superdex 200 10/300 GL column (GE Healthcare) that was pre-equilibrated with 
SEC buffer (20 mM Tris-HCl, pH 8.0, 300 mM NaCl, 0.5 mM TCEP) for Ca?*- 
free complex, and SEC buffer supplemented with 1 mM CaCl, for Ca**-bound 
complex (Extended Data Fig. 1b). The peak fractions containing Sytl- 
SNARE***#"*©* complex were pooled and concentrated to a final concentration 
of ~20 mg ml" for crystallization. 

Cloning, expression and purification of wild-type and mutant Syt1 C2B 
domains. All wild-type and mutant synaptotagmin-1 C2B domains were pre- 
pared by using a standard PCR-based protocol. All PCR products were subcloned 
into the pGEX-6P-1 (GE Healthcare) and expressed as GST-tagged fusion pro- 
teins in E. coli BL21(DE3) cells at 30 °C overnight using auto-inducing LB med- 
ium”. After harvesting the cells by centrifugation, the sample was resuspended in 
lysis buffer containing 50 mM HEPES-Na, pH 7.5, 300 mM NaCl, 2mM DTT and 
EDTA-free protease inhibitor cocktail (Roche), and then subjected to sonication 
and centrifugation. The supernatant was incubated with glutathione-sepharose 


beads (GE Healthcare). The resin was extensively washed with 50 ml of wash 
buffer I containing 50 mM HEPES-Na, pH 7.5, 300 mM NaCl, and 1 mM DDT, 
followed by 50 ml of wash buffer II containing 50mM HEPES-Na, pH 7.5, 
300 mM NaCl, 1mM DTT, and 50mM CaCl,. The GST tag was cleaved over- 
night at 4°C with PreScission protease (GE Healthcare) in cleavage buffer con- 
taining 50 mM HEPES-Na, pH 7.5, 300 mM NaCl, 1 mM DTT, 2mM EDTA. The 
cleaved proteins were purified by gel filtration on Superdex 75 (GE Healthcare) 
that was pre-equilibrated with SEC buffer containing 20 mM Tris-HCl, pH 8.0, 
150 mM NaCl, 0.5 mM TCEP. The peak fractions were pooled and concentrated 
to a final concentration of ~30mg ml’ for circular dichroism analysis and 
crystallization. 

Crystallization of the Ca**-bound Syt1-SNARE complex. Crystallization trials 
performed by mixing the neuronal SNARE complex and Syt1 C2AB fragments 
were unsuccessful, probably due to the weak affinity between the two compo- 
nents and the aggregation propensity at higher concentrations, especially in the 
presence of Ca**. To overcome these problems, crystallization trials were per- 
formed with chimaeras between Sytl C2AB and SNAP-25_C, as described 
above. The best three candidates consisted of three different linker lengths. 
Initially, thin crystalline plates that diffracted to about 30A were obtained 
from a construct connected by a 23-amino-acid linker. Screening of volatile 
crystallization additives (such as, methanol, ethanol, 1,2-butanediol) led to 
improved diffraction to about 10A resolution, with sharp Bragg peaks. The 
complex with a longer 37-amino-acid linker (Syt1-SNARE*”**"“*") was more 
soluble in SEC buffer compared to the one with the shorter 23-amino-acid 
linker. Clusters of needle-shaped crystals were obtained for this construct at 
first. Using a reverse vapour-diffusion method led to thicker crystal plates 
(Extended Data Fig. le), and eventually produced Bragg reflections past 4A 
resolution. There were consistently two crystal forms in the same drop with 
identical morphology, referred to as ‘short unit cell’ crystal form and ‘long unit 
cell’ crystal form, respectively. The difference between the short and long unit 
cell crystal forms of the Ca**-bound SytI1-SNARE complex can be approxi- 
mately described by a doubling of the number of complexes, except that one of 
the interacting Sytl C2AB fragments is absent (Fig. 1c, Extended Data Fig. 2g 
and Extended Data Fig. 3d). 

Before setting up crystal trays, purified Sytl-SNA complex 
(at a concentration of ~20 mg ml) was diluted to a final concentration of 
~8mgml | supplemented with CaCl, and MgCh. The final buffer contained 
20mM Tris-HCl (pH 8.0), 300mM NaCl, 100 mM MgCl, 1mM CaCl, and 
0.5mM TCEP. The reservoir contained 100 mM HEPES-Na (pH 7.5) and 1% 
PEG 8000. Equal amounts of protein and reservoir (2 pil) were mixed and incu- 
bated at 20 °C. The crystals appeared after 1 to 4 months, and were flash-frozen in 
a cryo-protecting solution containing the same constituents as the crystallization 
condition supplemented with 35% (v/v) sucrose. 

Although Syt1 was initially covalently linked to the C-terminal half of SNAP- 
25 (SNAP-25_C), the linker was slowly cleaved at ambient temperature 
(Extended Data Fig. 1c). Moreover, crystal growth required 1 to 4 months and 
contained entirely cleaved complex (Extended Data Fig. 1b), enabling formation 
of the 2:1 and 3:2 stoichiometries in the asymmetric units of the short and long 
unit cell crystal forms, respectively. Thus, the crystal structures are probably not 
affected by the initial presence of the linker. We simply refer to the resulting 
crystal structure as that of the Sytl-SNARE complex. 

Crystallization of the Mg”*-bound Syt1-SNARE complex. Crystals of Mg”*- 
bound Sytl-SNARE complex were grown using the reverse hanging-drop 
vapour-diffusion method. Purified Sytl-SNARE*”**""** (at a concentration of 
~20 mg ml ') that was prepared without Ca*~ and in the presence of EDTA (as 
described above) was diluted to a final concentration of ~4 mg ml’ and supple- 
mented with 100 mM MgCh. The final buffer contained 20 mM Tris-HCl (pH 
8.0), 300 mM NaCl, 100 mM MgCL,, and 0.5 mM TCEP. The reservoir contained 
100 mM HEPES-Na (pH 7.5) and 2.5% PEG3350. Similar to Ca?*-bound com- 
plex, the linker was also cleaved during crystallization. Mg’*-bound Sytl- 
SNARE complex crystals appeared after 2 months and were flash-frozen in a 
cryo-protecting solution containing the same constituents as the crystallization 
condition supplemented with 20% (v/v) glycerol. 

Crystallization of the quintuple mutant of the Syt1 C2B domain. Crystals of 
the quintuple mutant (R281A/E295A/Y338W/R398A/R399A) of the Sytl1 C2B 
domain were grown by the hanging-drop vapour diffusion method at 20°C by 
mixing 2 pl protein solution (at a concentration of ~10mgml') in a buffer 
containing 20 mM Tris-HCl, pH 8.0, 150mM NaCl, and 0.5mM TCEP) with 
equal volume of reservoir solution containing 100mM Tris-HCl (pH 8.5), 
1.5M ammonium sulfate. 

XFEL data collection and processing. The diffraction data of crystals of the long 
unit cell crystal form of the Ca**-bound Sytl-SNARE complex (Extended Data 
Table 1) were collected at the X-ray Pump Probe (XPP) endstation using the LCLS 
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XFEL at the SLAC National Accelerator Laboratory at Stanford University 
equipped with a Rayonix MX325 X-ray imaging detector. The XFEL beam was 
focused to 30 um at a nominal energy of 9 keV and a pulse duration of 40 fs in self- 
amplified stimulated emission (SASE) mode***’. Each 40-fs XFEL pulse at the 
XPP endstation at LCLS delivers 10’* photons, exceeding a dose of 30 MGy. 
The dose is so large that the diffraction volume is vaporized after exposure to a 
single XFEL pulse (Extended Data Fig. 2b). Nonetheless, the 40-fs laser pulse 
enables the so-called ‘diffraction before destruction’ data collection™**°, whereby 
the diffraction pattern is recorded before the sample is destroyed. 

The diffraction images were obtained from 148 frozen and cryo-protected 
crystals mounted in conventional cryo-loops at 100 K using a goniometer-based 
fixed-target sample delivery station®’. XFEL diffraction tests of hydrated crystals 
at room temperature did not show any marked improvement in comparison to 
cryo-preserved crystals, at least in terms of visible limiting resolution of the 
observed diffraction pattern. Furthermore, the entire crystal was damaged when 
exposed at room temperature, requiring many more crystals and much more 
time to collect a complete diffraction data set. As beam time and sample 
volume were limited, we decided to collect diffraction data at cryogenic temper- 
ature, since it was possible to obtain multiple diffraction images from the same 
frozen crystal. 

In most cases, 2-20 diffraction images (depending on crystal size and quality) 
were obtained from a single crystal. Since the diffracting volume was destroyed by 
the XFEL beam, a different volume had to be exposed for every shot (Extended 
Data Fig. 2b). The crystal was translated at least 100 ym between exposures. 
During data collection we observed that improved diffraction quality could be 
achieved by placing consecutive shots far apart (that is, across the length of the 
crystal from one another), then later ‘filling’ between the first and second shots in 
the same ‘shuttling’ manner. Finally, care was taken to collect exposures at a 
variety of spindle (#~) angles to maximize the completeness of the final data set. 

Of the 148 crystals used in the experiment, 113 crystals produced 578 images 
that could be processed; the other crystals did not diffract or showed multiple 
lattices. The long unit cell form occurred much more frequently (about 80%). 
Images indexed in the short unit cell crystal form were identified using a hier- 
archical clustering method® and were omitted from further processing since it 
was not possible to obtain a complete data set in this crystal form. Other images 
were rejected during data processing as described below. The 309 diffraction 
images (from 72 crystals) in the final selection for the long unit cell crystal form 
were indexed and integrated with the cctbx.xfel suite of data-processing soft- 
ware **, with the diffraction data processing parameters optimized by a grid 
search procedure (Lyubimov et al., manuscript in preparation). The integrated 
diffraction data were subsequently scaled, merged and post-refined with the 
PRIME software”. 

We optimized the data integration process using a combination of mosaic 
quality analysis”, highest number of bright reflections yielded by integration, 
and overlay of observed reflections on the actual integrated areas to determine 
whether nearly all visible reflections were integrated, and no unobserved reflec- 
tions were predicted and integrated. We inspected overlays of diffraction images 
and predicted reflection positions to fine-tune the computational approaches and 
optimize the data processing of the XFEL data. The iterative scaling and post- 
refinement approach used by PRIME allowed the construction of a complete 
diffraction data set from the relatively small number of diffraction images. 
Synchrotron beamline data collection. All other diffraction data sets (Extended 
Data Table 1) were collected using beamline 24ID-C of the Advanced Photon 
Source (APS) at Argonne National Laboratory (Argonne, IL). Diffraction data of 
the best crystals of both the Ca**-bound short unit cell crystal form and the 
Mg**-bound short unit cell crystal form of the Sytl-SNARE complex were 
indexed and integrated using the XDS software®, and scaled and merged using 
the SCALA program in CCP4 package’. Diffraction data of the quintuple mutant 
of the Sytl C2B domain were indexed, integrated, scaled and merged using the 
XDS software®*. 

Structure determination. The phases for all crystal structures of Sytl-SNARE 
complex were determined by molecular replacement with Phaser® using the rat 
SNARE complex (Protein Data Bank (PDB) code 1N7S), the rat Sytl C2A domain 
(PDB code 3F04), and the rat Sytl C2B domain (PDB code 1UOW) as search 
models. The structures were iteratively rebuilt and refined using the programs 
Coot, CNS1.3”°”!, and phenix.refine”’ (Extended Data Table 1). Both the Ca?*- 
and Mg**-bound SytI-SNARE complex in the short unit cell crystal form 
were refined with phenix.refine” using non-crystallographic symmetry (NCS) 
restraints, secondary structure restraints, and grouped B-factor refinement. 
The long unit cell crystal form of the Ca”*-bound Sytl-SNARE complex was 
initially refined using CNS 1.3””', with DEN restraints”, restrained grouped 
B-factors and NCS restraints, then further refined with phenix.refine” using 
non-crystallographic symmetry (NCS) restraints, secondary structure restraints, 
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and individual B-factor refinement. mF, — DF, annealed omit maps (Extended 
Data Fig. 2c and Extended Data Fig. 3b) were calculated using phenix.refine” 
using omit refinement consisting of three cycles of simulated annealing and 
grouped/individual B-factor refinement (grouped B-factor refinement for the 
Ca’*-bound Syt1-SNARE complex in the short unit cell crystal form, individual 
B-factor refinement for the Ca”*-bound Syt1-SNARE complex in the long unit 
cell form) with non-crystallographic symmetry (NCS) restraints and secondary 
structure restraints; residues 335-340 in Sytl and 159-166 in SNAP-25 were 
omitted (that is, residues in region I of the primary interface between Sytl and 
the SNARE complex), atoms within a 4 A cushion around these omitted residues 
were kept fixed. The R values of the refined structures (Extended Data Table 1) 
are well within the range that is typical at the corresponding resolutions”. 
Ramachandran analysis with MolProbity” indicated that 98% of the residues 
are in the favoured regions and none is in disallowed regions for both the Ca**- 
and Mg’*-bound Sytl-SNARE complexes in the short unit cell crystal form, 
and that 97% of the residues are in the favoured regions and none is in dis- 
allowed regions for the Ca**-bound Syt1-SNARE complex in the long unit cell 
crystal form. The quality indicators for the crystal structures of the Syt1-SNARE 
complex are well within acceptable ranges indicated by the ‘polygon’ plot’ 
produced by phenix.refine” and by the validation report of the deposited struc- 
tures and diffraction data. 

The phases for crystal structure of the quintuple mutant of the Sytl C2B 

domain were determined by molecular replacement with Phaser® using the rat 
Syt1 C2B domain (PDB code 1UOW) as the search model. The structure was 
iteratively built and refined using the program Coot®, and phenix.refine” 
(Extended Data Table 1). The final model consists of four Sytl C2B molecules 
in the asymmetric unit. Ramachandran analysis with MolProbity” indicated that 
98% of the residues are in the favoured regions and none are in disallowed 
regions. 
The LCLS XFEL produced superior electron density maps. The diffraction data 
obtained at the LCLS XFEL extended to substantially higher resolution than data 
collected at the APS NE-CAT microfocus synchrotron beamline from similar 
crystals of the long cell form (Extended Data Fig. 2a, b). It is notable that only 1 
out of 85 screened crystals in the long unit cell form diffracted to 4.1 A at APS, 
while 61 out of ~72 long unit cell crystals diffracted to at least 3.5 A at LCLS. In 
fact, only a lack of available XFEL beam time prevented us from collecting a 
complete diffraction data set beyond 3.5 A. We note that similar improvements 
in limiting resolution of XFEL versus synchrotron diffraction images have been 
observed for GPCRs”. Interestingly, while the short unit cell crystal form pro- 
duced a limiting resolution of 3.6A at APS NE-CAT, the density maps are 
substantially less well defined for side chains compared to the LCLS data set 
(compare Extended Data Fig. 2c-f and Fig. 3b, c). 

Taken together, the LCLS diffraction data set proved notably superior to the 
particular synchrotron-derived data sets that we collected in terms of limiting 
resolution and quality of the electron density maps, and was thus essential for 
more accurately determining side-chain positions. Moreover, in the LCLS XFEL 
crystal structure there is clear electron density for 19 Ca** bound to the Ca** 
binding sites of the Syt] C2 domains and to a few additional sites on the surfaces 
of Sytl and SNARE molecules (Extended Data Table 1). In contrast, electron 
density could be identified for only 7 Ca** in the short unit cell crystal form 
collected at the APS synchrotron. Taking into account the smaller number of 
molecules in the asymmetric unit of the short unit cell crystal form (two Sytl and 
one SNARE complex compared to three Syt1 and two SNARE complexes), there 
are still fewer Ca’* sites that were observed in electron density maps derived from 
the data set collected at APS, suggesting that Ca’ -binding sites have been affec- 
ted by radiation damage. 

At present, it is difficult to assess the relative quality of XFEL diffraction data 
studied here with conventional rotation diffraction data measured at a synchro- 
tron. We suspect that the standard diffraction data statistics (such as the merging 
R values) of rotation data are better due to the ability to directly measure full 
reflections (at least by summation of partials) without modelling partiality, which 
is still a relatively crude process even with the latest post-refinement approaches. 
In our opinion, these apparently poorer statistics obtained from the current state- 
of-the-art XFEL diffraction data processing methods are more than offset by the 
improved quality of the resulting electron density maps for diffraction data of 
crystals collected at the XFEL (Fig. 2c, d and Extended Data Fig. 2c-f). These 
maps, superior to those of the short unit cell data crystal structure collected at the 
APS NE-CAT microfocus synchrotron beamline at comparable limiting resolu- 
tion (Extended Data Fig. 3b, c), enabled us to better investigate the binding 
interfaces between Sytl and the SNARE complex, and was thus essential for 
the structural portion of our study. 

Validation and structure analysis. MolProbity” was used for evaluating the 
geometry and quality of the models (Extended Data Table 1). The electrostatic 
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potential maps were calculated and displayed using the UCSF Chimera package” 
(Chimera is developed by the Resource for Biocomputing, Visualization, and 
Informatics at the University of California, San Francisco (supported by 
NIGMS P41-GM103311)). All other structure figures were prepared with 
PyMol (DeLano, 2002, The PyYMOL Molecular Graphics System, http://www. 
pymol.org, Schrédinger, LLC). Interface areas were calculated by PISA”; note 
that the commonly used ‘buried surface area’ is twice the ‘interface area’. 
Design of mutations to disrupt the primary interface. We mutated combina- 
tions of region I interacting residues (Sytl E295A/Y338W and SNAP-25 K40A/ 
D166A), and region II interacting residues (Syt] R281A/R398A/R399A and 
SNAP-25 D51A/E52A/E55A). We verified that the Sytl1 mutants are properly 
folded by measuring circular dichroism (CD) spectra and thermal denaturation 
curves (Extended Data Fig. 7a), crystal structure determination of the Sytl 
C2B quintuple mutant (R281A/E295A/Y338W/R398A/R399A) (Extended Data 
Fig. 7b-d and Extended Data Table 1), and size-exclusion chromatography of 
Sytl and its mutants (Extended Data Fig. 7e). The crystal structure of the quin- 
tuple mutant is very similar to that of wild-type C2B (PDB code 2YOA) (main 
chain r.m.s.d. 0.43 A, Extended Data Fig. 7b). We also established that the 
mutated SNARE complexes form properly (Extended Data Fig. 7f). 

A subset of the interacting residues of region II were suggested to be function- 
ally important in previous studies: Syt1 R398/R399 (refs 35, 36, 78) and SNAP-25 
D51/E52/E55 (refs 27, 79, 80). However, interactions in region I have not been 
implicated in any previous studies. 

CD spectroscopy of wild type and mutants of Syt] C2B. CD measurements 
were conducted with CD spectrometer (Model 202-01, Aviv Biomedical, Inc.) 
equipped with a temperature controller. Data were collected with 10 1)M 
samples of wild-type and mutant Sytl C2B proteins in 20mM Tris (pH 8.0), 
150 mM NaCl buffer (without or with 5mM CaCl,) over a wavelength range of 
195 nm to 260 nm, with 1 nm increments, in a 1 mm path length cell at 25°C. 
Temperature denaturation experiments were performed at a wavelength of 
216 nm by increasing the temperature from 25 °C to 100 °C in 3 °C temperature 
increments, a 2 min temperature equilibration time, and a3 s averaging time. The 
fraction of unfolded protein at each temperature was calculated by using the 
formula (I,,; — Ip/(Uu — Ip), where I,,, is the observed mean residue ellipticity, 
and I, and Ir are the mean residue ellipticities of the unfolded and folded states, 
respectively. I, and I; were estimated by extrapolation of the linear regions of the 
extremes of the denaturation curves. 

Sequence alignment. The alignment was performed using ClustalW2 (http:// 
www.ebi.ac.uk/Tools/msa/clustalw2/) and the figures were prepared with 
Boxshade3.21 (http://www.ch.embnet.org/software/BOX_form.html). List of 
UniProt or GenBank accession numbers of Sytl homologues: Caenorhabditis 
elegans (worm Sytl, P34693); Drosophila melanogaster (fly Sytl, P21521); 
Lymnaea stagnalis (snail Syt1, AAO83847.1); Doryteuthis pealeii (squid Sytl, 
BAA09866.1); Danio rerio (zebrafish Sytl, XP_005164929.1; zebrafish Syt2, 
XP_009294914.1; zebrafish Syt9, AAI52175.1); Xenopus tropicalis (frog Sytl, 
XP_002935685.2); Chelonia mydas (turtle Syt1, XP_007057315.1); Gallus gallus 
(chicken Syt1, P47191); Rattus norvegicus (rat Syt1, P21707; rat Syt2, P29101; rat 
Syt9, P47861; rat Syt3, P40748; rat Syt4, P50232; rat Syt5, Q925C0; rat Syt6, 
Q62746; rat Syt7, Q62747; rat Syt8, Q925B4; rat Sytl0, 008625; rat Sytll, 
008835; rat Syt12, P97610; rat Syt13, Q925B5; rat Sytl4, MOR7W7; rat Sytl5, 
P59926; rat Syt16, D3ZB68); Homo sapiens (human Syt1, P21579; human Syt2, 
Q8N9I0; human Syt9, 000445). List of UniProt or GenBank accession numbers 
of SNAP-25 and syntaxin homologues: Caenorhabditis elegans (worm SNAP-25, 
NP_505641.2; worm syntaxin-1A, 016000); Drosophila melanogaster (fly SNAP- 
25, P36975; fly SNAP-29, NP_523831.1; fly syntaxin-1A, Q24547); Lymnaea 
stagnalis (snail syntaxin-1A, AAO83845.1); Doryteuthis pealeii (squid SNAP-25, 
AAM18191.1; squid syntaxin; CAA74913.1); Danio rerio (zebrafish SNAP-25, 
NP_001020729.1; zebrafish SNAP-29, NP_001243185.1; zebrafish syntaxin-1B, 
Q9I9P6); Xenopus laevis (frog SNAP-25, XP_005287463.1); Xenopus tropicalis 
(frog syntaxin-1A, NP_001072191.1); Chrysemys picta bellii (turtle SNAP-25, 
XP_007057315.1; turtle syntaxin-1A, XP_005294403.1); Gallus gallus (chicken 
SNAP-25, P60878; chicken syntaxin-1B, F5HN09); Rattus norvegicus (rat 
SNAP25a, P60881-2; rat SNAP-25b, P60881-1; rat SNAP-23, 070377; rat 
SNAP-29, Q9Z2P6; rat syntaxin-1A, P32851; rat syntaxin-1B, P61266; rat syn- 
taxin-2, P50279-2; rat syntaxin-3, Q08849; rat syntaxin-4, Q08850; rat syntaxin- 
5, Q08851-2; rat syntaxin-7, O70257); Homo sapiens (human SNAP-25a, 
P60880-2; human SNAP-25b, P60880-1; human SNAP-23, 000161; human 
SNAP-29, 095721; human syntaxin-1A, Q16623-1); Saccharomyces cerevisiae 
(yeast sec-9, P40357; yeast sso-1, P32867). 

Protein expression and purification for single vesicle-vesicle experiments. 
Full-length cysteine-free rat synaptobrevin-2, rat syntaxin-1A, rat SNAP-25A 
(with all endogenous cysteines changed to serines), rat Sytl (with all endogenous 
cysteines changed to alanine, except the cysteine residue at position 277), and 


wild-type rat complexin-1 were expressed and purified as previously described”. 
All mutants of Sytl and SNAP-25 were generated using the Quick Change 
Site-Directed Mutagenesis kit (Agilent) and expressed and purified using the 
same protocol. 

Protein reconstitution for single vesicle-vesicle fusion experiments. Neuronal 
SNAREs and Sytl represent a minimal system for Ca‘ -triggered membrane 
fusion. Recent evidence for this notion came from a reconstituted single ves- 
icle-vesicle assay that discriminates among vesicle association, lipid mixing 
and content mixing’”’**'. Moreover, addition of complexin-1 greatly enhanced 
Ca’*-triggered amplitude and synchronization, and suppressed spontaneous 
release in this system**. A variety of complexin-1 truncations and mutations 
qualitatively reproduced effects observed in neuronal cultures for both spontan- 
eous and Ca” -triggered release**, lending credence to this reconstituted system 
to investigate mechanistic questions. 

We used the same membrane compositions and protein densities as in our 

previous studies*”**. Likewise, the reconstitution protocol was similar** with 
several changes as described in ref. 38. Briefly, one class of vesicles was recon- 
stituted with both Syt1, or its mutants, and synaptobrevin-2 to mimic synaptic 
vesicles (referred to as SV vesicles), while another class of vesicles was recon- 
stituted with syntaxin-1A and SNAP-25 or its mutants to mimic plasma mem- 
branes (referred to as PM vesicles), using the previously described lipid 
compositions. The protein-to-lipid ratios used were 1:200 for synaptobrevin-2 
and syntaxin-1A, and 1:1,000 for Sytl and its mutants. A three- to fivefold 
excess of SNAP-25 and its mutants (with respect to syntaxin-1A) and 3.5 
mol% PIP, were added to the protein-lipid mixture for PM vesicles only. 
Dried lipid films were dissolved in 110mM B-octyl glucoside (B-OG) buffer 
containing purified proteins. Detergent-free buffer (20 mM HEPES-Na, pH 
7.4, 90 mM NaCl, 0.1% 2-mercaptoethanol) was then added to the protein-lipid 
mixture until the B-OG concentration reached the critical micelle concentration 
24.4mM. The vesicles were then subjected to size-exclusion chromatography 
using a Sepharose CL-4B column, packed under near-constant pressure by 
gravity with a peristaltic pump (GE Healthcare) in a 5 ml column with a 2 ml 
bed volume, that was equilibrated with buffer V (20mM HEPES-Na, pH 7.4, 
90mM NaCl, 20M EGTA, 0.1% 2-mercaptoethanol) followed by dialysis 
into 21 of detergent-free buffer V supplemented with 5g of Bio-beads SM2 
and 0.8 g 1”! Chelex 100 resin (Bio-Rad, Life Science Research). After 4h, the 
buffer was changed with 2 1 of fresh buffer V containing Bio-beads and Chelex, 
and dialysis continued for 12h. During the preparation of SV vesicles, 50 mM 
sulforhodamine B (Invitrogen) was present in all solutions before the size- 
exclusion chromatography step. As described previously*’, the presence and 
purity of reconstituted proteins was confirmed by SDS-PAGE of the vesicle 
preparations, and the directionality of the membrane proteins (facing outward) 
was assessed by chymotrypsin digestion followed by SDS-PAGE. The size dis- 
tributions of the SV and PM vesicles were analysed by cryo-EM, as described 
previously””. 
Single vesicle-vesicle content-mixing assay. We used the single vesicle-vesicle 
assay described in ref. 38. Briefly, SV vesicles were labelled with a soluble fluor- 
escent content dye (sulforhodamine B) at a moderately self-quenching concen- 
tration; for simplicity in this work we did not include a lipid dye since we were 
exclusively interested in the exchange of content, the correlation for neurotrans- 
mitter release. The PM vesicles were immobilized on a surface that was passivated 
with polyethylene glycol (PEG) and functionalized via streptavidin-biotin lin- 
kages. SV vesicles were then added in the presence of 2 1M complexin-1. We 
directly started monitoring the arrival of SV vesicles to surface-immobilized PM 
vesicles during the first minute acquisition period. A stepwise increase in fluor- 
escence emission of a spot in the field of view indicated the formation of a SV-PM 
vesicle pair during the vesicle association period. 

Unbound SV vesicles were then removed through extensive washing with 
vesicle-free buffer, while continuing real-time observation of the fluorescence 
intensity; consequently we did not observe any additional SV-PM vesicle asso- 
ciations after the washing step. While continuing the observation for another 
1-min period, a second step-wise increase of fluorescence intensity appeared for 
some fraction of the associated SV vesicles, which indicated Ca** -independent, 
that is, spontaneous fusion events (referred to has the spontaneous fusion period). 
Next, we injected 500 uM Ca?* solution, and continued monitoring for another 
minute, referred to as the Ca’ * -triggered fusion period. For associated SV vesicles 
that did not undergo spontaneous fusion during the second period, a step-wise 
increase in fluorescence intensity during the third period indicated a Ca”* -trig- 
gered fusion event. To determine the temporal arrival of Ca** in the evanescent 
field of our TIR microscope setup, soluble Cy5 dye was added with the Ca** 
buffer to monitor the emergence of fluorescence intensity. Thus, our improved 
single vesicle-vesicle assay enables one to monitor the association of SV vesicles, 
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spontaneous, and Ca’” triggered fusion events during the same data acquisition. 
Further details can be found in ref. 38. 

Previous reconstitutions often employed lipid mixing, rather than content 
mixing, between vesicles with reconstituted neuronal SNAREs®. For example, 
increased ensemble lipid mixing was observed upon addition of the soluble Syt1 
C2AB fragment and Ca?* (ref. 83). However, in retrospect, this result was prob- 
ably caused by an effect on vesicle association since multivalent binding of the 
soluble Sytl C2AB fragment can induce vesicle clustering’’. Moreover, sub- 
sequent work revealed major differences between using the soluble C2AB frag- 
ment and reconstituted, full-length Syt1 (refs 84, 85). More importantly, assays 
based on lipid mixing alone can produce misleading results since lipid mixing can 
occur without content mixing’. Our single vesicle-vesicle assay thus uses full- 
length Sytl and it monitors content mixing, a correlate for neurotransmitter 
release’””"**", 

Syt1 conditional knockout mice. Syt1 conditional knockout mice were gener- 
ated by the European Conditional Mouse Mutagenesis Program (EUCOMM) 
and are available from the European Mouse Mutant Archive (EMMA) 
(EM:06829). Exon 2 of Sytl containing the transmembrane domain is floxed 
and its removal results in a frameshift that produces a truncated protein. The 
Syt1 targeted mice were first crossed to FLPe mice to remove a gene trap cassette 
surrounded by frt sites. This cross yielded the conditional knockout mice as 
schematically shown in Fig. 3a. Exposure to Cre recombinase results in the total 
absence of synchronous release which is typical for Sytl knockout neurons 
(Fig. 4a, b). 

Neuronal cultures. Neuronal cultures were produced from wild-type and Syt1 
conditional knockout mice as previously described*’. Hippocampi were dissected 
from PO pups, dissociated by papain digestion, and plated on Matrigel-coated 
glass coverslips. Neurons were cultured in vitro in MEM supplemented with B27 
(Gibco), glucose, transferrin, fetal bovine serum and Ara-C (Sigma), and were 
analysed after 14-16 days. 

Lentivirus production. For rescue experiments, we used a lentiviral construct 
carrying a synapsin promoter, an optional rat Syt] cDNA, internal ribosome 
entry site (IRES), and a GFP-Cre recombinase fusion sequence. The control 
plasmid (TB592) contained no rescuing cDNA, with rescuing plasmids carrying 
the following cDNAs: TB761 (wild type), TB762 (R398Q/R399Q), TB765 
(E295A/Y338W), TB767 (R281A/R398A/R399A), TB777 (R281A/E295A/ 
Y338W/R398A/R399A). To make viruses, human embryonic kidney 293T cells 
were co-transfected with the lentiviral vector and three packaging plasmids. 
Supernatant containing the viruses was collected 48 h after transfection and 
was used to infect hippocampal neuronal cultures at day in vitro (DIV) four. 
Cultures were used for biochemical or physiological analyses at DIV 14-16. 
Electrophysiological recordings in cultured neurons. Recordings were per- 
formed essentially as previously described’®. The whole-cell pipette solution 
contained 40mMCsCl, 90mMK-gluconate, 1.8mMNaCl, 1.7mM MgCh, 
3.5mMKCl, 0.05 mMEGTA, 10mMHEPES, 2mM Mg-ATP, 0.4 mM Na- 
GTP, 10 mM phosphocreatine, and 10mM QX-314 (pH 7.4, adjusted with 
CsOH). The bath solution contained 140mM NaCl, 5mM KCl, 2mM CaCL, 
2mM MgCl,, 10mM HEPES, 10 mM glucose (pH 7.4, adjusted with NaOH). 
Evoked synaptic responses were triggered by a bipolar electrode. GABA-R- 
mediated IPSCs were pharmacologically isolated with CNQX (20 1M) and AP- 
5 (50 1M) in the bath solution and recorded at a —70 mV holding potential. Since 
the intracellular solution contains high internal Cl” levels, IPSCs evoke large 
inward currents. 

Immunoprecipitation and quantitative immunoblotting. Cultured Syt1 con- 
ditional knockout neurons infected with viruses expressing the desired mutants 
were solubilized in PBS (with 1mM CaCl, 0.2% Triton X-100, pH 7.4) sup- 
plemented EDTA-free protease inhibitor cocktail (Roche) for 1h. The lysate 
was cleared by centrifugation at 16,000g for 10 min at 4°C and immunopreci- 
pitation was performed by incubating with polyclonal antibodies to syntaxin-1 
(438B) or preimmune sera for 1 h at 4 °C, followed by incubation with 15 pl of a 
50% slurry of protein-A Sepharose beads (GE Healthcare) for 2h at 4°C. Beads 
were washed 5 times with 1 ml extraction buffer, bound proteins were eluted 
with 2 X SDS sample buffer containing 100 mM DTT and boiled for 20 min 
at 100°C. 

Co-precipitated proteins were separated by SDS-PAGE followed by detection 
with monoclonal antibodies against rat Syt1 (604.4, Synaptic Systems; this anti- 
body does not detect mouse Sytl) and synaptobrevin-2 (cl. 69.1, Synaptic 
Systems). To allow for quantitative detection, dye-conjugated secondary antibod- 
ies were used (IRDye 800CW Donkey anti-Mouse IgG, Li-cor), membranes were 
scanned in an Odyssey scanner (Li-cor), and quantification was performed using 
Image Studio software (Li-cor). All experiments included a Sytl wild-type 
group in addition to the desired mutants. The ratio of the Syt1/synaptobrevin- 
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2 signal was determined and normalized with the Syt1 wild-type condition being 
set equal to 1. 
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Schematic diagram of the co-expression system 
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Extended Data Figure 1 | Purification and crystallization of the Syt1- 
SNARE complex. a, Diagram of the Duet co-expression vectors (Novagen) 
that express the fragments of the neuronal SNARE complex and the C2AB- 
linker-SNAP-25_C chimaera used for purification and crystallization of the 
Sytl-SNARE?*”**"#"**" complex. The rat syntaxin-1A and His-tagged rat 
synaptobrevin-2 fragments were cloned into the vector pACYCDuet-1; the 
C2AB-linker-SNAP-25_C chimaera and the SNAP-25_N fragment were 
cloned into the vector pETDuet-1 with amino acid ranges labelled. Dashed lines 
represent the 37-amino-acid linker (see Methods). b, The purified Syt1- 
SNARE*”**##*" complex eluted as a single peak during size-exclusion 
chromatography (profile on the left). Left gel: Coomassie-blue-stained SDS- 
PAGE gel of the purified Sytl-SNARE*”**""** complex (unboiled and boiled). 
Right gel: Coomassie-blue-stained SDS-PAGE gel of dissolved crystals of 

the Sytl-SNARE complex that were grown over a period of 2 months starting 


0.25%v/v PEG8000 5 mM Tris-HCl 


from purified Syt1-SNARE*”**"™** (unboiled and boiled). Although Syt1 
was initially covalently linked to SNAP-25_C, the linker was cleaved during 
crystallization. The comparison between boiled and unboiled lanes is a 
hallmark showing that neuronal SNARE complex is fully formed. c, Boiled 
Coomassie-blue-stained SDS-PAGE gel of the purified Sytl-SNARE*7** i" 
complex in solution at ambient temperature at the specified time after 
purification. Cleavage is apparent on day one and progresses slowly over several 
days. d, Schema showing the commonly used vapour-diffusion technique: the 
drop contains a lower concentration of the precipitant than the reservoir. 
The crystallization of the quintuple mutant of Syt1 C2B is used as an example. 
e, Schema showing a reverse vapour-diffusion method that was used for 
crystallization of the Ca’* -bound Syt1-SNARE complex: the drop contains a 
higher concentration of the precipitant than the reservoir. 
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Extended Data Figure 2 | Diffraction images, electron density maps and 
crystal packing of the Sytl-SNARE complex in the long unit cell crystal 
form. a, Only one out of 85 screened crystals in the long unit cell crystal form 
diffracted to 4.1 A resolution at the APS NE-CAT microfocus synchrotron 
beamline (a total of 105 crystals were screened with 20 that indexed in the short 
unit cell crystal form). b, A total of 61 out of ~72 crystals in the long unit 
cell crystal form diffracted to at least 3.5 A resolution at the LCLS XFEL (a total 
of 148 crystals were diffracted, out of those 113 crystals produced 578 

images that could be processed; 35 crystals did not diffract or showed multiple 
lattices). These exposures were taken along the crystal c axis. The left upper 
pictures in a and b show images of loop-mounted crystals after X-ray exposure. 
c, mF, — DF, annealed omit map (Methods) of the Ca**-bound Syt1-SNARE 
complex in the long unit cell crystal form using diffraction data collected at 


the LCLS XFEL; omitted residues within region I of the primary interface 
(residues 335-340 in Syt1 and 159-166 in SNAP-25) are coloured cyan. The 
contour level is 2.30. d-f, Representative 2mF, — DF, electron density maps 
of the Ca”*-bound Sytl-SNARE complex in the long unit cell crystal form 
using diffraction data collected at the LCLS XFEL. The contour level is 1.50. 
g, Views of the crystal lattice perpendicular to the bc (left) and to the ac (right) 
planes of the Ca**-bound Sytl-SNARE complex in the long unit cell 
crystal form. The particular layer shown on the right corresponds to the red 
arrowhead in the left panel (only a slice corresponding to the layer is shown, 
creating the appearance of two disconnected groups of molecules—these 
groups are actually connected via interactions with the neighbouring layers). 
The red dashed oval indicates the ‘missing’ Syt1 C2AB fragment compared 
to the short unit cell crystal form (Extended Data Fig. 3d). 
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Extended Data Figure 3 | Asymmetric unit, electron density maps and 
crystal packing of the Syt1-SNARE complex in the short unit cell crystal 
form. a, Asymmetric unit of the Ca?*-bound Syt1-SNARE complex in the 
short unit cell crystal form at 3.6 A resolution using diffraction data collected 
at the APS NE-CAT microfocus synchrotron beamline (Extended Data 
Table 1). The colour code is the same as in Fig. 1c. Two Sytl C2AB fragments 
(distinguished by the designators I and I’) bind to the same SNARE complex 
in the asymmetric unit (see schema). b, mF, — DF. annealed omit map 
(Methods) of the Ca**-bound Syt1-SNARE complex in the short unit cell 
crystal form collected at the APS NE-CAT microfocus synchrotron beamline; 
omitted residues within region I of the primary interface (residues 335-340 
in Sytl and 159-166 in SNAP-25) are coloured cyan. The contour level is 2.30. 
Left side, without B-factor sharpening; right side, with B-factor sharpening. c, 
Representative 2mF, — DF, electron density map of the Ca?* -bound 
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B-factor (-55 A?) sharpened 
mFo-DFc annealed omit map 
R161 


mFo-DFc annealed omit map 


SNAP-25_C 


R161 


Sytt C2B (I) 


B-factor (-55 A*) sharpened 


2mFo-DFe density map 2mFo-DF: density map 


Synaptobrevin-2 


; ) SNAP-25_C 


Syt1-SNARE complex for the short unit cell crystal form using diffraction data 
collected at the APS NE-CAT microfocus synchrotron beamline. The contour 
level is 1.50. Left side, without B-factor sharpening; right side, with B-factor 
sharpening. b, c, The sharpening B-factor (—55 A ) was set to make the 
lowest atomic B-factor of the short unit cell crystal form comparable to that of 
the long unit cell crystal form. Even with B-factor sharpening, the electron 
density map of the long unit cell crystal form collected at the LCLS XFEL is 
superior to that of the short unit cell crystal form collected at the APS NE-CAT 
microfocus synchrotron beamline. d, Views of the crystal lattice perpendicular 
to the bc (left) and to the ac (right) planes of the Ca *-bound Sytl1-SNARE 
complex in the short unit cell crystal form. The particular layer shown on the 
right corresponds to the red arrowhead in the left panel. The unit cell is 
outlined by a black box. 
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Extended Data Figure 4 | Single-molecule FRET efficiency distributions of 
the Syt1-SNARE complex versus FRET efficiency values calculated from the 
Syt1-SNARE interfaces observed in the crystal structure. Shown are 
histograms of intermolecular single molecule FRET (smFRET) efficiency values 
that were measured between pairs of covalently attached organic labels on 
the Sytl C2AB fragment and the SNARE complex” (also shown as large 
spheres superimposed on the interfaces observed in the crystal structure). 
Arrowheads indicate FRET efficiencies calculated from the crystal structure 
of the Ca”*-bound SytI-SNARE complex in the long unit cell crystal form 
(complex I) for the primary, secondary and tertiary interfaces, using the 
methods and approximations described in ref. 28 to simulate the positions of 
dye centres in order to calculate the FRET-efficiency values. Only the dye pair 
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combinations between the nearest C2 domain (including the C2A—C2B linker) 
and the SNARE complex were calculated for the three interfaces. Note that 
owing to the presence of transitions between different states the histograms 
reflect a combined effect of interaction interfaces. The label at position 

A61 would have disrupted the tertiary interfaces between the C2A domain and 
the SNARE complex, explaining the discrepancy for these labels (indicated by 
open triangles). In retrospect, the top smFRET-derived model’* and the 
primary interface observed in the crystal structure primarily differed in the 
orientation of the C2B domain. Moreover, the top smFRET-derived model 
predicted the approximate location of the primary interface on the neuronal 
SNARE (see Fig. 4c in ref. 28). 
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Extended Data Figure 5 | Comparison of the two crystal forms and the 
Ca**- and Mg”*-bound crystal structures of the Sytl-SNARE complex. 

a, Superposition of the primary interfaces of the Ca” *-bound Syt1-SNARE 
complex structure in the long unit cell crystal form (gold and bright orange) 
and in the short unit cell crystal form (white). The primary interface is very 
similar in both crystal forms: the r.m.s.d. for the primary interface between 
both crystal forms is 0.38 A (bright orange) and 0.42 A (white) for complex I 
and complex II, respectively (including Cx atoms of the SNARE complex 
and the Sytl C2B (I) domain forming the interface). b, Superposition of 
complex I in the long unit cell crystal form with the asymmetric unit of the short 
unit cell crystal form, but only showing the secondary interface (light-blue 
shaded disk) between Syt1 C2B (I’) and the SNARE complex (I). The bottom 
panels show close-up views of the secondary interface: left, interacting 
residues (sticks and balls); right, a 90° rotated view of the view shown in the left 
panel. The Syt1 C2B (1’) domain is rotated by 16° between the two crystal forms 
and, as a consequence, the interactions between residues R281, K288 and 
R398 of the Sytl C2B (I') domain and residues E224 and E228 of syntaxin-1A 
are slightly changed by this rotation. Notably, residues Syt1 R281, K288 and 
R398 are involved in both the primary (Fig. 2) and secondary interfaces. 
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c, Superposition of complex I in the long unit cell crystal form with the 
asymmetric unit of the short unit cell crystal form, showing all interfaces. 

d, Superposition of the Ca**-bound (white) and Mg’* -bound (black) crystal 
structures of the Sytl-SNARE complex, both in the short unit cell crystal 
form. The lower left panel shows a close-up view of the primary interface, 
indicating that it is very similar in both the Ca”*- and Mg”* -bound crystal 
structures. The Syt1 C2B domain that forms the secondary interface (light-blue 
shaded disk) is rotated by 19° between the Ca”*- and Mg” * -bound complexes. 
The lower-right panel is a rotated view of the complex, also showing the 
tertiary interface (light-green shaded disk), and the C2A—C2B interface that 
involves asymmetry-related Syt1 C2A domain (I’) (grey shaded disk). 

e, B-factor coloured cartoon representations of the asymmetric units of the 
Ca”*-bound long unit cell crystal form (top), the Ca**-bound short unit cell 
crystal form (bottom left), and the Mg”*-bound short unit cell crystal form 
(bottom right) of the Sytl-SNARE complex. Note that the primary interfaces 
have relatively low B-factors, similar to the majority of the structure, while parts 
of the C2A and C2B domains involved in the secondary and tertiary 
interfaces have higher B-factors, possibly indicating increased flexibility. 
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Extended Data Figure 6 | Sequence alignments of Sytl, SNAP-25 and 
syntaxin-1A from different homologues. a, Sequence alignment of Syt1 
homologues, showing the sequences around the primary interface of the Sytl- 
SNARE complex. Note that rat Syt5 refers to UniProt ID Q925CO, zebrafish 
Syt9 refers to GeneBank accession number AAI52175, rat Syt9 refers to 
UniProt ID P47861, and human Syt9 refers to UniProt ID 000445. 

b, Electrostatic potential surfaces of the known crystal structures of 
synaptotagmin-1, synaptotagmin-3, synaptotagmin-4 and synaptotagmin-7; 
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the dashed rectangles indicate the regions that correspond to the primary 
interface regions I and II of the Sytl-SNARE complex. c, Sequence alignment of 
different SNAP-25 homologues, showing the sequences around the primary 
interface of the Sytl-SNARE complex. d, Sequence alignment of different 
syntaxin homologues, showing a sequence range around the primary interface 
of the Sytl-SNARE complex. In all panels, the interacting residues of the 
primary interface are indicated by solid circles and coloured boxes for region I 
(cyan) and region II (red/orange). 
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Extended Data Figure 7 | Syt1 mutants and SNARE complexes with SNAP- 
25 mutants are well folded. a, Top panels: CD spectra of wild-type and 
mutant Syt] C2B domains in the absence of Ca*". Bottom panels: thermal 
denaturation was monitored by molar ellipticity at a wavelength of 216 nm in 
the absence of Ca** (black) and in the presence of 5mM Ca** (red). The 
specified melting temperatures were estimated as the mid-point of the melting 
curves (Methods). b, Superposition of the Syt] C2B domains from the Ca’*- 
bound Sytl1-SNARE complex in the short unit cell crystal form (gold), the 
crystal structure of the quintuple mutant (R281A/E295A/Y338W/R398A/ 
R399A) of the Sytl C2B domain (green), and the crystal structure of the isolated 


oe eee ee fe Syntaxin-1A 
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Syt1 C2B domain (white, PDB code 2YOA). ¢, d, Representative m2F, — DF, 
electron density maps of the crystal structure of the quintuple mutant of the 
Syt1 C2B domain (Extended Data Table 1) contoured at 2.00. The labels 
refer to the mutated residues. e, Overlay of SEC profiles of full-length Syt1 
mutant proteins used in the single vesicle-vesicle fusion assay (Fig. 3d-g). 

f, Coomassie-blue-stained SDS-PAGE with and without boiling of neuronal 
SNARE complexes formed by full-length SNAP-25 and its mutants, 
syntaxin-1A and synaptobrevin-2, using the proteins that were used in the 
single vesicle-vesicle fusion assay (Methods). 
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Extended Data Figure 8 | Probability of fusion versus time upon for Syt] mutants upon 500 {1M Ca’* injection (a) and spontaneous fusion 
500M Ca?* injection and spontaneous fusion for Syt1 and SNAP-25 (b), and SNAP-25 mutants upon 500 uM Ca”" injection (c) and spontaneous 
mutants. Shown are the data that were used to generate Fig. 3d-g. The number _ fusion (d). e-g, Control experiments: e, Ca~* -triggered fusion; f, spontaneous 
of independent experiments and analysed events are provided in Extended fusion with 3 mM ATP, without SNAP-25 or Syt1; and g, mock injection 


Data Table 2. a-d, Cumulative histograms of probability of fusion versus time without Ca**. 
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Extended Data Table 1 | Crystallographic data and refinement statistics 


“Values in parentheses are for the highest resolution shell. 
(rotation)”’ refers to rotation diffraction data collected at the APS synchrotron and “(still)” refers to still diffraction data collected at the 


tee 


Data collection 
Beamline 

Space group 

Cell dimensions 

a, b,c (A) 

a, By (°) 
Resolution (A) 
Rmerge (%) (rotation)' 
Rinerge (%) (still)* 
CCI1/2 

I/ol 

Completeness (%) 
Multiplicity (rotation)’ 
Multiplicity (still)’ 


Refinement 
Resolution (A) 
No. reflections 
Ryork / Rive 
No. atoms 
Protein 
Ca** 
Mg” 


B-factors 
Protein 
Ca** 
Mg” 


R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


LCLS XFEL. 


Ca”*-bound Syt1- 
SNARE complex 
(long unit cell 
crystal form) 


SLAC-LCLS 
P2,2;2) 


69.6, 171.1, 291.9 
90.0, 90.0, 90.0 
20.0-3.50 (3.62-3.50)° 


39.7 (32.2) 
92.7 (35.5) 
7.7 (2.6) 

87.6 (65.6) 


5.0 (1.8) 


20.0-3.50 (3.62-3.50) 
39174 (2884) 
0.322 / 0.353 


10890 
19 
0 


0.003 
0.758 


Ca’*-bound Syt1- 
SNARE complex 
(short unit cell 
crystal form) 


APS-NECAT 
P2;2)2 


69.1, 171.6, 146.9 
90.0, 90.0, 90.0 
50.0-3.60 (3.73-3.60) 
6.7 (74.1) 


99.9 (89.5) 
12.3 (1.9) 

99.7 (98.7) 
14.5 (12.4) 


50.0-3.60 (3.73-3.60) 
20846 (2004) 
0.249 / 0.289 


6506 


0.004 
0.862 


Mg”'-bound Syt1— 
SNARE complex 
(short unit cell 
crystal form) 


APS-NECAT 
P2;2)2 


69.1, 171.8, 146.6 
90.0, 90.0, 90.0 
85.9-4.10 (4.32-4.10) 
10.4 (64.6) 


99.7 (64.6) 
8.8 (1.9) 
95.9 (97.0) 
3.4 (3.5) 


50.0-4.10 (4.25-4.10) 
13519 (1320) 
0.276 / 0.323 


6510 


0.003 
0.714 
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Extended Data Table 2 | Data summary table for the single vesicle-vesicle fusion experiments with Syt1 and SNAP-25 mutants 


No. of spontaneous No. of Ca” triggered Total no. of No. of independent 
fusion events fusion events analyzed events experiments (N) 
(no. of associated 
vesicle pairs) 


90 167 3764 5 
100 262 4632 6 
Sytl WT 49 152 3035 5 
64 173 3564 4 
159 256 6503 9 
Syt] mutants 
E295A/Y338W 113 246 5872 12 
R398Q/R399Q 166 217 6724 7 
R281A/R398A/R399A 133 271 5134 7 
Sytl quintuple 49 143 5241 7 
SNAP-25 mutants 
K40A/D166A 61 139 2549 7 
D51A/ES2A/E55A 81 150 2717 7 
SNAP-25 quintuple 105 261 6741 
Control 
+ATP 59 129 2830 5 
—SNAP-25 24 36 3040 5 
—Sytl 57 58 2379 5 
Mock -- 25 1947 5 
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Structural insights into the bacterial 
carbon-phosphorus lyase machinery 
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Bjarne Jochimsen! & Ditlev E. Brodersen! 


Phosphorus is required for all life and microorganisms can extract it from their environment through several metabolic 
pathways. When phosphate is in limited supply, some bacteria are able to use phosphonate compounds, which require 
specialized enzymatic machinery to break the stable carbon-phosphorus (C-P) bond. Despite its importance, the details 
of how this machinery catabolizes phosphonates remain unknown. Here we determine the crystal structure of the 
240-kilodalton Escherichia coli C-P lyase core complex (PhnG-PhnH-PhnI-PhnJ; PhnGHIJ), and show that it is a 
two-fold symmetric hetero-octamer comprising an intertwined network of subunits with unexpected 
self-homologies. It contains two potential active sites that probably couple phosphonate compounds to ATP and 
subsequently hydrolyse the C-P bond. We map the binding site of PhnK on the complex using electron microscopy, 
and show that it binds to a conserved insertion domain of PhnJ. Our results provide a structural basis for understanding 


microbial phosphonate breakdown. 


Phosphonate compounds that contain a stable C-P bond are used as a 
source of phosphate by microorganisms in many natural environments 
where low levels of free and organic phosphate limit growth’. The C-P 
lyase pathway, which converts phosphonate into 5-phosphoribosyl-a- 
1-diphosphate (PRPP) in an ATP-dependent fashion, is activated upon 
phosphate starvation in many bacterial species including Escherichia 
coli’. The enzymes of this pathway have a very broad substrate spe- 
cificity enabling the bacteria to utilize a wide range of compounds for 
growth including alkyl, amino-alkyl and aryl phosphonates*”’. 

In E. coli, the 14-cistron phn operon is required for phosphonate 
uptake and utilization and encodes an ATP-binding cassette trans- 
porter (PhnC, PhnD and PhnE), a regulatory protein (PhnF) and 
components required for enzymatic conversion of phosphonate into 
PRPP (PhnGHIJKLMNOP)*. PhnG, PhnH, PhnI and PhnJ have 
been shown to form a stable protein complex, which we term the C-P 
lyase core complex, probably with PhnG and Phnl at its centre’*». 
The core complex stably associates with a fifth protein, PhnK, which 
resembles ABC cassette proteins, with unknown stoichiometry™. 
PhnJ contains an iron-sulfur cluster required for C-P bond cleavage 
via an S-adenosyl methionine (SAM)-dependent radical mech- 
anism'*"*, while PhnI is a nucleosidase capable of deglycosylating 
ATP and GTP to ribose 5-triphosphate’®. A reaction mechanism for 
the breakdown of phosphonate via the C-P lyase pathway was pro- 
posed where PhnI, supported by PhnG, PhnH and PhnL, the latter a 
protein not present in the core complex, catalyses the transfer of the 
phosphonate moiety to the ribose 1’ position of ATP by displacing 
adenine, generating a ribose 5’ -triphosphate alkyl phosphonate inter- 
mediate (Fig. la and Extended Data Fig. 1). Following pyrophosphate 
release by PhnM, PhnJ cleaves the C-P bond and PhnP/PhnN convert 
the resulting ribose cyclic phosphate into PRPP'®"’. The C-P lyase 
core complex thus harbours two key activities of this pathway; coup- 
ling of the phosphonate to ATP (PhnG, PhnH and PhnI) and C-P 
bond cleavage (PhnJ)’*. PhnH is the only component of the C-P lyase 
core complex that has been structurally characterized and displays a 
fold related to the pyridoxal-5’-phosphate-dependent transferases. 


It forms a homodimer when expressed independently”, and its role 
within the complex is unclear. 


The global architecture of C-P lyase 

We purified the E. coli C-P lyase core complex and determined its 
crystal structure by molecular replacement in combination with 
single-wavelength anomalous dispersion using a Ta,Br,. cluster 
derivative and the PhnH crystal structure’’ as a search model. The 
structure was refined using a native data set extending to 1.7 A with 
resulting final R factors of 14.9% (Rwork) and 17.6% (Rgree) (Extended 
Data Table 1 and Extended Data Fig. 2). The structure consists of two 
copies of each of PhnG (16 kDa), PhnH (21 kDa), PhnI (39 kDa) and 
PhnJ (32 kDa), comprising a total of 1,958 amino acid residues in the 
asymmetric unit (Fig. 1b and Extended Data Fig. 3), and is complete 
except for a few residues located at the subunit termini. The structure 
includes four sulfate ions, four zinc ions and 1,792 solvent molecules. 
Together, the eight polypeptides form a compact and intertwined, two- 
fold symmetric hetero-octamer that can be described as (PhnGHI))., 
with a total molecular mass of 240 kDa (Fig. 1c, d), consistent with its 
behaviour in solution”. 

The C-P lyase core complex resembles the letter “H’ with rounded 
arms that are twisted approximately 45° in and out of the plane with 
respect to each other. The arms are composed on opposing sides by 
the two PhnG molecules and on the other sides by tight complexes 
between PhnJ and PhnH (Fig. 1c). At the centre of the molecule, 
a compact PhnI homodimer forms a disc-like structure that serves 
as a central hub for attachment of the other subunits (Fig. 1c, d, green). 
The core domain of Phnl, which is the largest single domain in the 
structure, has a novel « + B fold comprised of a four-stranded, anti- 
parallel B-sheet next to a four-helix bundle combining in a unique 
fold: the B-barrel domain (Figs 1c and 2a). At both termini there are 
helical extensions of approximately 35 residues that grasp PhnJ and 
tether it to the complex via extensive interactions (Fig. 3a). In turn, 
PhnJ attaches PhnH to the complex through packing of conserved 
a-helices in both proteins (Fig. 1c). 
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Figure 1 | Overall architecture of the C-P lyase 


a fo) 
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sy a uly Po Oo : 42 OL 0 catalyses transfer of a phosphonate to the 1’ 
ot PhnM ° tg 3 o” © = \_/ \-o position of ATP (PhnI assisted by PhnG, PhnH and 
ATP Adenine Ho OH i HO = 07 or PhnL) and cleavage of the C-P bond (PhnJ). 


b, Overview of the four proteins with dashed lines 
194 indicating conserved structural domains. 
Functional residues are shown with amino acid 
one-letter code. BBD, B-barrel domain; CID, 
central insertion domain; CMD, C-terminal mini 


PhnJ 


PhnJ has a compact « + B fold surrounded by two mini-domains, 
the central insertion domain (CID) and the C-terminal mini domain 
(CMD) (Figs 1b, 2b and Extended Data Fig. 3). Surprisingly, the core 
folds of PhnJ and PhnH are nearly identical (Cou r.m.s.d. 2.5 A), 
despite very little sequence similarity (Extended Data Fig. 4a). 
Moreover, the interactions in the PhnH-PhnJ heterodimer closely 
resemble those observed in the crystal structure of the isolated 
PhnH homodimer (Extended Data Fig. 5a)'*. The CID is an insertion 
in PhnJ between 85 and £6 of the corresponding PhnH fold and 
consists of two a-helices and a short 3j9-helix (Fig. 2b). The CID is 
well conserved among PhnJ orthologues and contacts both of the 
central PhnI molecules (Fig. 3b). Finally, the CMD is located at the 
C terminus and consists of a small B-hairpin and a helix. It is stabilized 
by a zinc ion coordinated by four conserved cysteine residues: Cys241, 
Cys244, Cys266 and Cys272 (Fig. 2b). 

The PhnI monomers bind each other via an extensive, conserved 
surface interaction area comprising ~75% of the total PhnGHIJ 
dimerization interface (Extended Data Fig. 6). Each molecule of 
Phnl interacts with both copies of PhnG (Fig. 3c), the smallest protein 
in the complex displaying an elongated « + £ fold with a four- 
stranded, antiparallel B-sheet against a four-helix bundle (f-barrel 
domain, Fig. 2c). Despite very little sequence similarity, the closest 
known structural homologue of PhnG is PhnI, with which it shares 
both the long B-hairpin and the helical bundle (Fig. 2 and Extended 
Data Fig. 4b). The PhnG f-hairpin and C-terminal helix form a 
molecular clamp that connects to a groove in PhnI, forming an unusu- 
ally long, combined B-barrel domain (80 A, Fig. 3d). 


The iron-sulfur binding site 

PhnJ belongs to the anaerobic radical SAM enzyme superfamily in which 
three conserved cysteine residues coordinate a cubane-like Fe,S, cluster” 
that promotes formation of a free electron radical required for catalysis 
by reductive cleavage of SAM to a 5’-deoxyadenosyl radical (Ado-CH,") 


GC CC domain; NTD, N-terminal domain. c, Overall 
E | CID | structure of the 240 kDa C-P lyase core complex. 
1 129 169 235 281 d, Schematic architecture of the complex with 


structural domains indicated. 


and L-methionine”'**. PhnJ does not contain the canonical CX3;CX3C 
motif but rather a CX,;CX,,C motif involving Cys241, Cys244 and 
Cys266, which are both necessary and sufficient for reconstitution of 


a NTD (1-88) c 


PhnG 


BBD 


PhnJ 
CMD 
Nes (236-281) 
Ce ‘ORL 4 
‘yy \ <P 


(129-169) 


Figure 2 | Details of subunit structures. a-d, Details of the individual protein 
structures in the complex: PhnI (a), PhnJ (b), PhnG (c) and PhnH (d), aligned 
to show their structural homologies and with domain colours as in Fig. 1b. Ions 
are shown as spheres; sulfate, red and yellow; zinc, pink. BBD, B-barrel domain; 
CID, central insertion domain; CMD, C-terminal mini domain; NTD, 
N-terminal domain. 
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Figure 3 | Subunit interactions within the C-P lyase core complex. a, The 
termini of PhnI (grey, C and cyan, N) grasp PhnJ (semi-transparent blue 
surface). Domains in Phnl are coloured as in Fig. 1 and the sulfate ion is shown 
as spheres. b, The central insertion domain (CID, purple) contacts both copies 
of PhnI (green surfaces). Residues involved in the interaction are shown with 
sticks. c, PhnI (green/cyan) interacts with both copies of PhnG (orange). 

d, PhnG (orange) grabs both sides of PhnI (green) using the B-barrel domain 
and C-terminal helix (indicated with an asterisk). The combined [-barrel 
domain is 80 A long. 


the iron-sulfur cluster in vitro'”’’. In the structure, these cysteines are 
juxtaposed and coordinate a zinc ion (Fig. 4a) in an arrangement that 
closely matches that expected for an Fe,S, cluster-containing protein 
(Fig. 4b). Furthermore, super-positioning an S-adenosyl methionine 
activase structure on this region reveals a small groove on the surface 
of the C-P lyase core complex next to the cluster site that might accom- 
modate SAM” (Extended Data Fig. 5b). 

According to the proposed reaction mechanism, the Ado-CH,° 
radical is transferred to the universally conserved Gly32 of PhnJ, 
generating a stable glycyl radical enzyme that supports multiple turn- 
overs without further SAM consumption”. According to this scheme, 
transfer of the radical from Gly32 to the fourth conserved cysteine 
residue (Cys272) generates a thiyl radical capable of homolytic C-P 
bond cleavage of 5-phosphoribosyl-1-phosphonate (Extended Data 
Fig. 1) through a thiophosphonate radical intermediate’’. Cys272 is 
situated adjacent to the cluster site where it is the fourth ligand bind- 
ing the zinc ion, while Gly32 is located more than 30 A away, in the 
vicinity of PhnH (Fig. 4c). A direct involvement of Gly32 in the 
reaction’ is therefore difficult to reconcile with the structure; how- 
ever, it cannot be excluded that structural rearrangements could alter 
the position of the cluster relative to Gly32 to bring them into prox- 
imity. The CMD containing the cluster site has higher B factors, 
suggesting that it is relatively loosely attached to the PhnJ core and 
perhaps could detach during the reaction (Fig. 2b and Extended Data 
Fig. 5c). 


A second potential active site 


At the interface of PhnI and PhnjJ, three universally conserved histi- 
dine residues come together to form a second metal-ion-binding site. 
Analysis of the anomalous difference density confirms that this His 
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Figure 4 | Overview of the active sites of the C-P lyase core complex. 

a, Details of the iron-sulfur cluster site with a bound zinc ion (green, anomalous 
difference density at 6.00). b, The Fe,S, cluster found in S-adenosyl methionine 
activase (anSMEcpe, Protein Data Bank accession code: 4K37)™* shown in the 
same orientation as in a. Sticks are coloured by standard atom colours (C, 
green/salmon; Fe, orange; S, yellow). c, The environment of PhnJ Gly32 (space 
fill). d, The zinc-ion-binding site at the interface between PhnI and PhnJ (His 
site) with the sulfate ion shown. e, A tunnel (magenta) leads from the sulfate 
(red and yellow) and zinc ions (grey) to the surface of the complex (blue area). 


site also contains zinc (Fig. 4d). Two of the residues (PhnI His328 and 
His333) coordinate the zinc ion directly (2.4 A), while the third (PhnJ 
His108) is further away (4.5 A). The three histidines are located in a 
cavity between PhnI and PhnJ that connects to the surface of the 
complex via a solvent-accessible tunnel (Fig. 4e). The cavity also 
contains a sulfate ion located 9.5 A from the zinc, which may mimic 
a substrate phosphate or phosphonate. Finally, access to the cavity is 
defined by the PhnJ CID domain, which forms a lid-like domain. 
Studies of zinc-binding proteins show that structural zinc sites 
usually have four protein ligands, while active-site zinc ions have a 
more open coordination sphere with 2-3 ligands similar to that 
observed in this case**. To assess the functional importance of the 
His site, we used genetic complementation to determine whether 
mutation of the histidine residues affects the ability of E. coli to 
utilize phosphonates. A plasmid-borne copy of the wild-type 
phnGHIJKLMNOP allele was used to complement E. coli Aphn- 
HIJKLMNOP under conditions where phosphonates were the sole 
phosphate source (Extended Data Fig. 7)**. Unlike the wild 
type, none of the variants (PhnI(H333A), PhnI(H328A;H333A), 
PhnJ(H108A) and PhnJ(C272A)) could utilize phosphonates; 
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we therefore conclude that the His site is required for the activity of 
the C-P lyase core complex in vivo. 


PhnK binds via the PhnJ CID 


The C-P lyase core complex stably associates with a fifth component, 
PhnK (28 kDa)"*. The function of PhnK is unclear but it contains the 
consensus elements of an ATP-binding cassette protein, suggesting 
that it might deliver nucleotides for the reaction (Extended Data 
Fig. 8)'*. Despite its ability to stably co-purify, we were unable to 
obtain crystals of a complex including PhnK. We mapped the 
PhnK-binding site on the complex using negative-stain electron 
microscopy by generating a 3D reconstruction of purified 
PhnGHJkK (Fig. 5a and Extended Data Fig. 9). The crystal structure 
fits tightly within the resulting electron microscopy density map and 
reveals additional density in a groove close to the two-fold symmetry 
axis near two regions of highly conserved residues on PhnJ (Fig. 5a, b). 
The fold of PhnK can be roughly modelled using a homologous nuc- 
leotide-binding domain of an ABC transporter (Protein Data Bank 
accession code: 4FWI)”. The electron microscopy map is consistent 


CER 
Roa 
PhnH’ 

Phnk- 
binding 
region 


Conserved 


Variable 


Figure 5 | Mapping of PhnK using electron microscopy. a, Orthogonal views 
of the negative stain electron microscopy 3D reconstruction with the C-P lyase 
core complex crystal structure docked and the position of PhnK indicated. 
Colours are as in Fig. 1c. b, Surface conservation of the C-P lyase core complex 
from variable (teal) to conserved (burgundy) with the PhnK-binding region 
(dashed area) indicated. c, A single PhnK molecule (based on Protein Data 
Bank entry 4FWI)” fitted into the electron microscopy density masked to 
remove the C-P lyase core complex. The model is coloured by conservation 
using Consurf” at the same scale as in b. 
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with a single PhnK binding unilaterally to the complex, breaking the 
two-fold symmetry (Fig. 5c). Although the exact orientation of PhnK 
cannot be established at this resolution, we note that one side is highly 
conserved among orthologues, suggesting that it comprises the inter- 
action surface (Fig. 5c). 

ABC modules often dimerize in a head-to-tail fashion, binding 
ATP between the Walker A/B motifs of one subunit and the ABC 
motif of the other”*. PhnK contains a variant ABC motif (FSGGMQ 
versus LSGGQ), which could serve to bind the C-P lyase core com- 
plex (Extended Data Fig. 8). The conserved CID domain protrudes 
into the PhnK-binding region, so to probe its importance we con- 
structed C-P lyase core complexes lacking residues 130-171 of PhnJ 
(Extended Data Fig. 10). Purification PhnGHIJ(ACID)K demon- 
strated that upon deletion of the CID, the C-P lyase core complex 
remains intact but PhnK is missing, thus indicating that the CID 
region of PhnJ is required for tethering PhnK to the core complex. 


Discussion 


In this paper, we delineate the organization and detailed molecular 
structure of a core complex involved in phosphonate catabolism in 
bacteria. We show that four of the proteins required for phosphonate 
breakdown assemble into a large, hetero-octameric core complex with 
two-fold symmetry and that the symmetry is broken by binding of a 
fifth, ATP-binding subunit, PhnK. The structure is not immediately 
compatible with the direct involvement of Gly32 (PhnJ) in catalysis, 
but structural rearrangements may affect the location of this residue 
during the reaction. Many glycyl radical enzymes require separate 
activation enzymes that dissociate upon radical formation, a task that 
could also be maintained by a flexible internal domain’’. 

The structure indicates the existence of a second active site at the 
interface of PhnI and PhnJ. Analysis of difference electron density 
maps from several independent data sets revealed a consistent density 
next to the bound zinc ion, but we have been unable to identify the 
bound molecule. We also carried out co-crystallization using a range 
of compounds including nucleotides and phosphonates, but no fur- 
ther substrate binding was observed. This suggests that the complex 
needs an Fe,S, cluster, or is not in the correct conformational state to 
bind a substrate. We speculate that the His site is required for coupling 
a phosphonate to ATP, which is known to depend on PhnI’*. 

Using electron microscopy we locate the binding region for PhnK 
on the C-P lyase core complex. While this does not reveal the role of 
PhnK in the reaction, we note that the region is close to the His site 
and it is therefore possible that structural changes in PhnK occurring 
upon ATP hydrolysis may affect substrate access. With the detailed 
architecture of the C-P lyase core complex thus delineated, future 
work will focus on understanding the requirements of the two reac- 
tions catalysed by the complex and definitively locating the binding 
sites of substrates and reaction intermediates. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and protein purification. The construction of pHO572 
(expressing phnGHIJ) and pHO575 (expressing phnGHIJK) as well as gene 
expression in E. coli strain HO2735 (A(lac)X74 A(phnCDEFGHIJKLMNOP) 
33-30/F lacI? zzf:Tn10) were described previously’. pHO575 encodes a 
C-terminally six-histidine tagged version of PhnK while PhnGHIJ has no tag 
but still binds to Ni?* NTA agarose beads. The PhnGHIJ(ACID) and 
PhnGHIJ(ACID)K constructs were created by site-directed mutagenesis using 
primers 5'-GTGCCAATCCCCGAGGGCGGCTATCCGGTGAAGGTA-3’ (ACID 
forward), and 5’-TACCTTCACCGGATAGCCGCCCTCGGGGATTGGCAC-3’, 
(ACID reverse) which result in the replacement of residues 130-172 of PhnJ by 
two glycine residues (underlined in the primers). Cells were in all cases grown at 
37 °C in Luria Broth (LB) medium and gene expression achieved overnight at 18 °C 
by induction using 0.5 mM isopropyl-f-b-thiogalactoside (IPTG). Cells were pelleted 
by centrifugation at 16,000 r.p.m. for 45 min at 4 °C and resuspended in lysis buffer 
(50 mM HEPES, pH 7.5, 500 mM NaCl, 5 mM MgCls, 20% (v/v) glycerol, and 3 mM 
2-mercaptoethanol) supplemented with Complete Protease Inhibitor Cocktail tablets 
(Sigma) and lysed by high-pressure homogenization (EmulsiFlex-C5, Evesin) at 
15,000 p.s.i. The lysed cells were centrifuged at 16,000 r.p.m. for 45 min to remove 
cell debris and bound to Ni** NTA agarose beads on a 5-ml pre-packed HisTrap HP 
column (GE Healthcare), pre-equilibrated with lysis buffer (PhnGHI)) or lysis buffer 
plus 20 mM imidazole (PhnGHIJK). In all cases, the complexes were eluted by 
increasing the imidazole concentration to 250 mM. Following overnight dialysis at 
4°C against buffer LS1 (50 mM HEPES, pH 7.5, 100 mM NaCl, 5 mM MgCh, and 
5 mM 2-mercaptoethanol), the samples were applied to a 1 ml Source 15Q column 
(GE Healthcare), pre-equilibrated with buffer LS1 and eluted using a linear gradient 
from 100-600 mM NaCl. The samples were then diluted to reach 250 mM NaCl and 
passed over a 1 ml Mono Q column (GE Healthcare) pre-equilibrated with buffer LS2 
(50 mM HEPES, pH 7.5, 250 mM NaCl, 5 mM MgCl, and 5 mM 2-mercaptoeth- 
anol), washed, and eluted using a 250-400 mM NaCl gradient. Finally, the complexes 
were purified on a Superdex 200 10/300 GL size-exclusion column (GE Healthcare) 
equilibrated with buffer GF (50 mM HEPES, pH 7.5, 300 mM NaCl, and 5 mM 
2-mercaptoethanol). A 2 | culture typically yielded 4-8 mg of a purified protein 
complex. Purification from the PhnGHIJ(ACID) and PhnGHIJ(ACID)K constructs 
was stopped after the Source Q column. 

Crystallization and structure determination of PhnGHIJ. Crystals of the 
PhnGHIJJ complex were obtained using batch crystallization at 4°C with a res- 
ervoir solution containing 20% (w/v) PEG 10,000, 0.1 M HEPES, pH 7.5, 1 mM 
trisodium citrate dihydrate, 2.3% (w/v) 1,8-diaminooctane, and 5 mM 2-mercap- 
toethanol. Crystallization drops contained 1 il protein sample mixed with 0.7 pl 
reservoir solution and 0.3 1] of a micro-seed stock” obtained from early stage hits. 
Crystals suitable for data collection appeared within 2-3 days reaching maximum 
dimensions of 1.0 X 0.2 X 0.2 mm. Crystals were collected, cryo-protected by 
gradual addition of glycerol to a final concentration of 25% (v/v), and flash-frozen 
in liquid nitrogen. For structure solution by molecular replacement with single- 
wavelength anomalous dispersion (MR-SAD), crystals were derivatized for 24 h 
with TasBr,2 (Proteros Biostructures)*', which was added directly to the crystal- 
lization drop as powder. Diffraction data were collected at 100 K at the XO6DA 
beamline at SLS, Villigen, Switzerland on a PILATUS 2M detector (native crys- 
tals) or a PILATUS 6M detector at the XO6SA beamline (TagBrj2 data). Data sets 
were processed with xia2 (ref. 32) and the structure was solved by MR-SAD 
method using Phaser and AutoSol via PHENIX*’. The PhnH structure’? (PDB 
ID: 2FSU) was used as a molecular replacement search model in Phaser to locate 
the TagBr,, sites by MR-SAD, whereby the partial molecular replacement solu- 
tion allowed identification of the TagBr, cluster sites and served as a source of 
phase information in phenix.autosol. Initial phases were obtained after density 
modification using RESOLVE in the phenix.autosol pipeline and the resulting 
maps used to auto-build secondary structure elements using ARP/wARP from the 
CCP4 package”. The resulting partial model was then used as a search model for 
molecular replacement against the native data using Phaser and run through 25 
cycles of backbone auto-tracing using the native data set in SHELXE™. Finally, a 
near-complete model of PhnGHIJJ could be auto-built using phenix.autobuild, 
RESOLVE, and Buccaneer*®. Missing parts of the model were completed manu- 
ally in Coot*”. The model was iteratively improved and refined using phenix.re- 
fine*’, and validated using MolProbity**. The final model contains 1,958 amino 
acid residues, four bound sulfate ions, four zinc ions, and 1,792 molecules of 
ordered solvent. The model is complete except for 18 residues exclusively located 
at the termini of the subunits. Structure figures were made using PyMOL”, 
Chimera*’, and ConSurf”’. 
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Electron microscopy. Purified samples of PhnGHIJK samples were applied to 
Quantifoil R2/2 holey carbon on copper grids (Quantifoil, Jena)’, covered with 
an additional thin film of amorphous carbon, and rendered more hydrophilic 
with a 9:1 argon-oxygen plasma (Fischione Model 1070). The specimens were 
stained with 3% ammonium molybdate at pH 8 followed by 2% uranyl acetate. 
Micrographs were recorded at 44,000X magnification on a Tecnai T12 micro- 
scope equipped with a US4000 4K X 4K pixel CCD detector (Gatan) at 120 keV 
with defoci in the 0.8-2 um range and using an electron dose of 20 electrons/A”. 
10,137 single particles were manually picked from 105 micrographs using 
e2boxer from the EMAN2 software package” and contrast transfer function 
parameters were determined using CTFFIND3 (ref. 43). Three iterations of 2D 
classification were performed in RELION™ using 300 class averages to determine 
particles that did not align well with each other. These particles were removed 
from subsequent analysis. After 2D classification, the final set of 10,033 particles 
was used to calculate a 3D reconstruction in RELION without symmetry 
imposed. The initial model for the reconstruction was prepared by low-pass 
filtering a density map generated from the C-P lyase core complex crystal struc- 
ture to 40 A. The final model has a resolution of 16 A by the 0.143 ‘gold standard’ 
Fourier-shell correlation‘ and a resolution of 28 A versus the crystal structure 
at FSC = 0.5. The latter is probably closer to the true resolution of the map as the 
granular nature of negative stain can introduce correlations in the half-maps that 
are not related to the protein structure. The map was validated using 419 tilt-pairs 
recorded using angles 0 ° and 30 ° (P = 0.01, « = 2.7)”. The FSC versus the crystal 
structure shows correlation between the crystal structure and the electron micro- 
scopy density at low resolution, after which deviations due to structural differ- 
ences between the C-P lyase core complex and the PhnGHIJK complex become 
apparent. 

In vivo complementation. For the in vivo complementation studies, E. coli strain 
BW16711 (AphnHIJKLMNOP) was transformed by the plasmid pGY1’°, conferring 
ampicillin resistance and encoding phnGHIJKLMNOP, and analysed for its ability to 
grow on MOPS minimal plates“* supplemented with 0.2% glucose, 100 jg ml”? 
ampicillin, 0.1 mM IPTG and 0.2 mM of either methyl phosphonate, 2-aminoethyl 
phosphonate, or phosphate ion as a positive control. The PhnI(H328A) variant, the 
PhnI(H328A;H333A) double variant, the PhnJ(C272A) variant as well as the 
PhnJ(H108A) variant, were introduced into pGY1 by site-directed mutagenesis by 
PCR using the following primers, 5’-GCAGGCTTTGTCTCGGCCCTCAAACTC 
CCCCA-3' (H328A forward), 5’-TGGGGGAGTTTGAGGGCCGAGACAAAG 
CCTGC-3' (H328A reverse), 5'-GCAGGCTTTGTCTCGGCCCTCAAACTCCC 
CGCCTACGTCGATTTCCA-3’ (H328A;H333A forward), 5'-TGGAAATCGAC 
GTAGGCGGGGAGTTTGAGGGCCGAGACAAAGCCTGC-3’  (H328A;H333A 
reverse), 5’-TCCGATACCGATTATGCCCGCCAACAGAGCGA-3’ (C272A for- 
ward), 5’ -TCGCICTGTTGGCGGGCATAATCGGTATCGGA-3 d (C272A 
reverse), 5’-CTTATCCAGACGCGTGCCCGCATCCCCGAAAC-3’ (H108A for- 
ward), and 5’-GITTCGGGGATGCGGGCACGCGTCTGGATAAG-3’ (H108A 
reverse), where changes relative to the wild-type sequence are underlined. 
Template DNA was digested by the methylation-dependent endonuclease, Dpnl, 
before transformation of non-ligated DNA into NovaBlue Singles (Novagen) elec- 
trocompetent E. coli cells and selection on ampicillin plates. All mutations were 
confirmed by sequencing of the entire phnGHIJKLMNOP region of the pGY1 vector 
to ensure that no other spontaneous mutations had been introduced that could 
prevent rescue of the BW16711 strain. 
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Extended Data Figure 1 | The conversion of a phosphonate to breaks the C-P bond of the ribose-coupled phosphonate liberating the alkyl 


5-phosphoribosyl-a-1-diphosphate (PRPP). PhnI supported byPhnG,PhnH moiety and generating 5-phosphoribosyl 1,2-cyclic phosphate. Finally, the 
and PhnL catalyses the transfer of the phosphonate moiety to the 1’ position combined activities of PhnP (a phosphoribosyl cyclic phosphodiesterase) and 
of the ribose of ATP through displacement of adenine, generating a PhnN (a ribosyl bisphosphate phosphokinase) result in the formation of PRPP 
5-triphosphoribosyl-c-1-phosphonate. Subsequent to the removal of via ribose 1,5-bisphosphate. 

pyrophosphate by PhnM yielding a 5-phosphoribosyl-«-1-phosphonate, PhnJ 
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Extended Data Figure 2 | Representative examples of electron density. electron density contoured at 2.0 r.m.s.d. b, Close-up of the aromatic side 
a, The interface between PhnjJ (blue, residues 45-52 and 104-110 includinga _ chains in the central -sheet of PhnJ (residues 118-126, 203-207 and 211-217), 
bound sulfate ion) and Phnl (green, residues 321-341) showing 2F, — F. with the same contouring as a. 
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Extended Data Figure 3 | Sequences of the proteins of the C-P lyase core parentheses. f1 and £3 are not included in the figures in this paper as they only 
complex with secondary structure. Protein sequences are shown along with _ have two hydrogen bonds each. c, PhnI. The N-terminal domain is shown with 
secondary structure assignment based on the crystal structure and colours asin sea green and the B-barrel domain with green and light-green colours. d, PhnJ. 
Fig. 1. a, PhnG. The first o-helix is two residues longer in one of the two copies The central insertion domain is shown in purple and the C-terminal mini 

in the complex and indicated with a dashed box. The B-barrel domainis shown domain ina darker blue colour. Figure produced using SecSeq (D. E. Brodersen, 
in yellow and orange colours. b, PhnH. The numbering of B-strands follows the — unpublished, http://www.bioxray.au.dk/~ deb/secseq). 

convention from ref. 19. Helix names (A-E) used in that paper are shown in 
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Extended Data Figure 4 | Cross alignments. a, Alignment of the amino acid —_ of the CID and CMD are indicated with brackets as well as with colours. 
sequences of E. coli PhnH and PhnjJ. Identical residues are shown in red and _ b, Alignment of PhnG and PhnI (only partial sequence). Conserved regions are 
conserved functionality in green. Secondary structure colours correspond to shown in dashed boxes. Figure produced using SecSeq (D. E. Brodersen, 
Fig. 1 and conserved regions are shown in dashed boxes. For PhnJ, the positions — unpublished, http://www.bioxray.au.dk/~ deb/secseq). 
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PhnJ 
Extended Data Figure 5 | The structure and function of PhnJ. a, Two S-adenosyl methionine activase enzyme (PDB ID: 4K37)** has been overlaid to 
perpendicular views of the PhnH-PhnJ heterodimer as observed within the visualize its putative placement in the pocket between PhnG and PhnjJ. c, The 
C-P lyase core complex (blue and red, left) and aligned views of the PhnH, C-P lyase core complex shown in surface representation with PhnJ and the 


homodimer from the isolated crystal structure (PDB ID: 2FSU; green, right)’. CMD in cartoon, coloured by B factor to show flexibility (B = 25 A’, blue, 
b, Surface view of the C-P lyase core complex with PhnG shown in yellow, PhnI_ —B = 45 A’, magenta). The zinc and sulfate ions are shown with spheres. 
in green, and PhnJ in blue. The position of SAM as modelled from an 
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2926 A® 


PhnH 


variable conserved 


Extended Data Figure 6 | Interaction areas within the C-P lyase core overview of the surface conservation of the C-P lyase core complex shown as a 
complex. a, Dimerization interface between halves of the (PhnGHIJ). colour gradient from teal (variable) to burgundy (conserved) as indicated. 
complex. Colours as in Fig. 1. b, Interaction areas between the individual Right, conservation at the interaction interfaces between the individual 


subunits within each dimer half. c, Two perpendicular views of the C-P lyase _ subunits of the complex. 
core complex shown in surface representation with colours as in Fig. 1c. d, Left, 
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Extended Data Figure 7 | In vivo complementation of E. coli monitored on minimal plates with either no phosphorus source (a), 
AphnHIJKLMNOP. E. coli strain BW16711 (AphnHIJKLMNOP) 2-aminoethyl phosphonate (2-AEPn) (b), methyl phosphonate (MPn) (c), 
complemented with a plasmid-borne copy of either wild-type (Wt) or phosphate ion (d). The data shown are representative of three independent 


phnGHIJKLMNOP or variants thereof, including PhnJ(C272A), PhnJ(H108A), — experiments. 
PhnI(H328A), or the PhnI(H328A;H333A) double variant. Growth is 
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Extended Data Figure 8 | PhnK sequence alignment. Alignment of the 
protein sequence of E. coli PhnK with homologous proteins from a wide range 
of microorganisms. Conserved residues are shown on a teal background, and 
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residues mentioned in the main text and the location of the ABC cassette 
consensus motifs (Walker A, Q motif, ABC motif, Walker B, D loop, and H 
motif) are indicated. 
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Extended Data Figure 9 | Negative stain electron microscopy of PhnGHIJK. 
a, Raw micrograph representative of 100 images collected. Scale bar is 500 A. 
b, Selection of 2D reference-free class averages from a total of 300 classes 
showing the particle in various orientations. Each class is 172 A wide. c, Fourier- 
shell correlation (FSC) of the final electron microscopy density map as a 
function of resolution (black line) with FSC = 0.143 at 16 A. The red line shows 


ARTICLE 


16A 


Resolution (A) 


28A 


100A 


the correlation between the crystal structure and the electron microscopy 
density, which has FSC = 0.5 at 28 A. The inset figure shows an equal area 
projection plot for the electron microscopy tilt pair validation data set. The 
circle is an approximation of a 99% confidence interval that contains the 
representative direction (blue plus), includes the true tilt direction (red cross), 
and excludes the untitled direction (origin). 
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Extended Data Figure 10 | Purification of the PhnGHIJ(ACID) and 
PhnGHIJ(ACID)K complexes. a, Sequence alignment of the CID region of 
PhnJ with the corresponding part of the structural domain of PhnH, where 
strands 5 and 6 of the B-sheet are connected by a short Gly-Gly turn (green 
residues). In PhnJ(ACID), the CID domain spanning residues 130-171 (red 
residues) are replaced by a similar dipeptide turn, which should maintain the 
overall domain fold. b, SDS-PAGE gel showing purified C-P lyase complexes 


(PhnGHIJJ and PhnGHIJJK) both with (Wt) and without (ACID) the CID on 
PhnJ. PhnK is missing from the complex purified from the PhnGHIJ(ACID)K 
construct (red arrow). The data shown are representative of three independent 
purifications. c, Overview of the C-P lyase core complex structure docked in the 
PhnGHIJJK electron microscopy density with the location of the PhnJ CID 
domains (purple) and PhnK (cyan cartoon) indicated. 
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Extended Data Table 1 | Data collection, phasing and refinement statistics 


ARTICLE 


Native TagBrj2 derivative 

Data collection 
Space group P24212) P2;2j2 
Cell dimensions 

a, b, c (A) 95.5, 133.7, 176.7 95.8, 143.1, 178.7 

a, B, y(°) 90.0, 90.0, 90.0 90.0, 90.0, 90.0 
Resolution 58.91 - 1.70 A (1.74- 1.70 A) 48.24 - 3.50 (3.60 - 3.50 A) 
Wavelength 1.00004 A 1.25524 


Unique reflections 
R-meas (%) 

CCy, 

I/ OI 

Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. of reflections 
Rwork/ Ryree (%) 
No. of atoms 
Protein (non-hydrogen) 
SO,” / Zn?* 
Water 
B-factors (A) 
Protein 
SO.” 
Veal 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 
Ramachandran statistics’ 
Favoured (%) 
Allowed (%) 
Outliers (%) 


246,950 (17,488) 
7.0 (114.7) 

99.9 (51.6) 

17.7 (1.7) 

99.7 (99.1) 

5.6 (5.5) 


§94.-1.7 
246,797 
14.9 (17.6) 


30,097 (15,203) 
24 
1,792 


29.87 
40.28 
20.97 
21,15 


0.01 
1.17 


97.5 
2.0 
0.5 


59,846 (4,847) 
15.9 (48.0) 
99.7 (96.4) 
17.0 (7.7) 
100.0 (100.0) 
26.6 (26.3) 


* Highest resolution shell is shown in parentheses, except where otherwise stated. 


+ Statistics from MolProbity via Phenix?®. 
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Conventional superconductivity at 203 kelvin at high 
pressures in the sulfur hydride system 


A. P. Drozdov'*, M. I. Eremets!, I. A. Troyan', V. Ksenofontov? & S. I. Shylin? 


A superconductor is a material that can conduct electricity without 
resistance below a superconducting transition temperature, T.. 
The highest T, that has been achieved to date is in the copper oxide 
system’: 133 kelvin at ambient pressure” and 164 kelvin at high 
pressures’. As the nature of superconductivity in these materials is 
still not fully understood (they are not conventional superconduc- 
tors), the prospects for achieving still higher transition tempera- 
tures by this route are not clear. In contrast, the Bardeen-Cooper- 
Schrieffer theory of conventional superconductivity gives a guide 
for achieving high T, with no theoretical upper bound—all that is 
needed is a favourable combination of high-frequency phonons, 
strong electron-phonon coupling, and a high density of states’. 
These conditions can in principle be fulfilled for metallic hydrogen 
and covalent compounds dominated by hydrogen**, as hydrogen 
atoms provide the necessary high-frequency phonon modes as well 
as the strong electron-phonon coupling. Numerous calculations 
support this idea and have predicted transition temperatures in 
the range 50-235 kelvin for many hydrides’, but only a moderate T, 
of 17 kelvin has been observed experimentally*. Here we investigate 
sulfur hydride’, where a T, of 80 kelvin has been predicted’®. We 
find that this system transforms to a metal at a pressure of approxi- 
mately 90 gigapascals. On cooling, we see signatures of supercon- 
ductivity: a sharp drop of the resistivity to zero and a decrease of 
the transition temperature with magnetic field, with magnetic sus- 
ceptibility measurements confirming a T, of 203 kelvin. Moreover, 
a pronounced isotope shift of T, in sulfur deuteride is suggestive of 
an electron-phonon mechanism of superconductivity that is con- 
sistent with the Bardeen—Cooper-Schrieffer scenario. We argue 
that the phase responsible for high-T, superconductivity in this 
system is likely to be H3S, formed from HS by decomposition 
under pressure. These findings raise hope for the prospects for 
achieving room-temperature superconductivity in other hydro- 
gen-based materials. 

A search for high- (room)-temperature conventional superconduct- 
ivity is likely to be fruitful, as the Bardeen—Cooper-Schrieffer (BCS) 
theory in the Eliashberg formulation puts no apparent limits on T,. 
Materials with light elements are especially favourable as they provide 
high frequencies in the phonon spectrum. Indeed, many superconduc- 
tive materials have been found in this way, but only a moderately high 
T, = 39 K has been found in this search (in MgB,; ref. 11). 

Ashcroft’ turned attention to hydrogen, which has very high vibra- 
tional frequencies due to the light hydrogen atom and provides a 
strong electron-phonon interaction. Further calculations showed that 
metallic hydrogen should be a superconductor with a very high T, of 
about 100-240 K for molecular hydrogen, and of 300-350 K in the 
atomic phase at 500 GPa (ref. 12). However, superconductivity in pure 
hydrogen has not yet been found, even though a conductive and prob- 
ably semimetallic state of hydrogen has been recently produced’’. 
Hydrogen-dominated materials such as covalent hydrides SiH, 
SnHy, and so on might also be good candidates for showing high-T- 


superconductivity®. Similarly to pure hydrogen, they have high Debye 
temperatures. Moreover, heavier elements might be beneficial as they 
contribute to the low frequencies that enhance electron-phonon coup- 
ling. Importantly, lower pressures are required to metallize hydrides in 
comparison to pure hydrogen. Ashcroft’s general idea was supported 
in numerous calculations”’® predicting high values of T, for many 
hydrides. So far only a low T, (~17 K) has been observed experiment- 
ally*. 

For the present study we selected H2S, because it is relatively easy to 
handle and is predicted to transform to a metal and a superconductor 
at a low pressure P~ 100 GPa with a high T,~ 80 K (ref. 10). 
Experimentally, H2S is known as a typical molecular compound with 
a rich phase diagram“. At about 96 GPa, hydrogen sulphide trans- 
forms to a metal’*. The transformation is complicated by the partial 
dissociation of H,S and the appearance of elemental sulfur at P > 27 
GPa at room temperature, and at higher pressures at lower tempera- 
tures’*. Therefore, the metallization of hydrogen sulphide can be 
explained by elemental sulfur, which is known to become metallic 
above 95 GPa (ref. 16). No experimental studies of hydrogen sulphide 
are known above 100 GPa. 

In a typical experiment, we performed loading and the initial pres- 
sure increase at temperatures of ~200 K; this is essential for obtaining 
a good sample (Methods). The Raman spectra of H,S and D2S were 
measured as the pressure was increased, and were in general agreement 
with the literature data’”’* (Extended Data Fig. 1). The sample starts to 
conduct at P ~ 50 GPa. At this pressure it is a semiconductor, as shown 
by the temperature dependence of the resistance and pronounced 
photoconductivity. At 90-100 GPa the resistance drops further, and 
the temperature dependence becomes metallic. No photoconductive 
response is observed in this state. It is a poor metal—its resistivity at 
~100 Kis p ~ 3 X 10° ohm mat 110 GPaand p ~ 3 X 10°’ ohmm 
at ~200 GPa. 

During the cooling of the metal at pressures of about 100 GPa 
(Fig. la) the resistance abruptly drops by three to four orders of 
magnitude, indicating a transition to the superconducting state. At 
the next increase of pressure at low temperatures of T < 100 K, T, 
steadily increases with pressure. However, at pressures of >160 GPa, 
T. increases sharply (Fig. 1b). As higher temperatures of 150-250 K 
were involved in this pressure range, we supposed that the increase of 
T, and the decrease of sample resistance during warming (Fig. 1a) 
could indicate a possible kinetic-controlled phase transformation. 
Therefore in further experiments, after loading and after the initial 
pressure increase at 200 K, we annealed all samples by heating them 
to room temperature (or above) at pressures of >~150 GPa (Fig. 2a, 
see also Extended Data Fig. 2). This allowed us to obtain stable results, 
to compare different isotopes, to obtain the dependence of T, on 
pressure and magnetic field, and to prove the existence of supercon- 
ductivity in our samples as follows. (We note that additional informa- 
tion on experimental conditions are given in the appropriate figure 
legends.) 


1Max-Planck-Institut fir Chemie, Hahn-Meitner-Weg 1, 55128 Mainz, Germany. Institut fiir Anorganische Chemie und Analytische Chemie, Johannes Gutenberg-Universitat Mainz, Staudingerweg 9, 


55099 Mainz, Germany. 
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Figure 1 | Temperature dependence of the resistance of sulfur hydride 
measured at different pressures, and the pressure dependence of T.. a, Main 
panel, temperature dependence of the resistance (R) of sulfur hydride at 
different pressures. The pressure values are indicated near the corresponding 
plots. At first, the sample was loaded at T ~ 200 K and the pressure was 
increased to ~100 GPa; the sample was then cooled down to 4 K. After 
warming to ~100 K, pressure was further increased. Plots at pressures <135 
GPa have been scaled (reduced) as follows—105 GPa, by 10 times; 115 GPa and 
122 GPa, by 5 times; and 129 GPa by 2 times—for easier comparison with the 
higher pressure steps. The resistance was measured with a current of 10 HA. 
Bottom panel, the resistance plots near zero. The resistance was measured with 
four electrodes deposited on a diamond anvil that touched the sample (top 
panel inset). The diameters of the samples were ~25 um and the thickness was 


(1) There is a sharp drop in resistivity with cooling, indicating a 
phase transformation. The measured minimum resistance is at least as 
low, ~10-'* ohm m—about two orders of magnitude less than for 
pure copper (Fig. 1, Extended Data Fig. 3e) measured at the same 
temperature’’. (2) A strong isotope effect is observed: T, shifts to lower 
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~1 um. b, Blue round points represent values of T, determined from a. Other 
blue points (triangles and half circles) were obtained in similar runs. 
Measurements at P >~ 160 GPa revealed a sharp increase of T.. In this pressure 
range the R(T) measurements were performed over a larger temperature range 
up to 260 K, the corresponding experimental points for two samples are 
indicated by adding a pink colour to half circles and a centred dot to filled 
circles. These points probably reflect a transient state for these particular P/T 
conditions. Further annealing of the sample at room temperature would require 
stabilizing the sample (Fig. 2a). Black stars are calculations from ref. 10. Dark 
yellow points are T, values of pure sulfur obtained with the same four-probe 
electrical measurement method. They are consistent with literature data’? 
(susceptibility measurements) but have higher values at P > 200 GPa. 


temperatures for sulfur deuteride, indicating phonon-assisted super- 
conductivity (Fig. 2b, c). The BCS theory gives the dependence of T- on 
atomic mass mas T. x m ”, where « ~ 0.5. Comparison of T, values 
in the pressure range P > 170 GPa (Fig. 2c) gives « ~ 0.3. (3) T. shifts 
to lower temperatures with available magnetic field (B) up to 7 T 
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Figure 2 | Pressure and temperature effects on T. of sulfur hydride and 
sulfur deuteride. a, Changes of resistance and T- of sulfur hydride with 
temperature at constant pressure—the annealing process. The sample was 
pressurized to 145 GPa at 220 K and then cooled to 100 K. It was then slowly 
warmed at ~1 K min '; T. = 170 K was determined. At temperatures above 
~250 K the resistance dropped sharply, and during the next temperature run T. 
increased to ~195 K. This T, remained nearly the same for the next two 
runs. (We note that the only point for sulfur deuteride presented in ref. 9 was 
determined without sample annealing, and T, would increase after annealing at 
room temperature.) b, Typical superconductive steps for sulfur hydride 
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(blue trace) and sulfur deuteride (red trace). The data were acquired during 
slow warming over a time of several hours. T- is defined here as the sharp 
kink in the transition to normal metallic behaviour. These curves were 
obtained after annealing at room temperature as shown in a. c, Dependence of 
T, on pressure; data on annealed samples are presented. Open coloured 
points refer to sulfur deuteride, and filled points to sulfur hydride. Data shown 
as the magenta point were obtained in magnetic susceptibility 

measurements (Fig. 4a). The lines indicate that the plots are parallel at 
pressures above ~170 GPa (the isotope shift is constant) but strongly deviate at 
lower pressures. 
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Figure 3 | Temperature dependence of the resistance of sulfur hydride in 
different magnetic fields. a, The shift of the ~60 K superconducting transition 
in magnetic fields of 0-7 T (colour coded). The upper and lower parts of the 
transition are shown enlarged in the insets (axes as in main panel). The 
temperature dependence of the resistance without an applied magnetic field 
was measured three times: before applying the field, after applying 1, 3, 5, 7 T 
and finally after applying 2, 4, 6 T (black, grey and dark grey colours). b, The 


(Fig. 3). Much higher fields are required to destroy the superconduct- 
ivity: extrapolation of T.(B) gives an estimate of a critical magnetic 
field as high as 70 T (Fig. 3). (4) Finally, in magnetic susceptibility 
measurements (Fig. 4) a sharp transition from the diamagnetic to 
the paramagnetic state (Fig. 4a) was observed for zero-field-cooled 
(ZFC) material. The onset temperature of the superconducting state 
Tonset = 203(1) K, and the width of the superconducting transition 
is nearly the same as in electrical measurements (Fig. 4a). Magne- 
tization measurements M(H), where H is magnetic field, at different 
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same measurements but for the 185 K superconducting transition. c, The 
temperature dependence of the critical magnetic field strengths of sulfur 
hydride. T, (black points deduced from a, b) are plotted for the corresponding 
magnetic fields. To estimate the critical magnetic field H., the plots were 
extrapolated to high magnetic fields using the formula HT) = H.o(1 — (T/ 
T.)°). The extrapolation has been done with 95% confidence (band shown as 
grey lines). 


temperatures (Fig. 4c) revealed a pronounced hysteresis indicating 
type II superconductivity with the first critical field H.. ~ 30 mT. 
The magnetization decreases sharply at temperatures above 200 K 
showing the onset of superconductivity at 203.5 K, in agreement with 
the susceptibility measurements (Fig. 4a). A list of key properties of the 
new superconductor is given in Methods. 

We have presented purely experimental evidence of superconduct- 
ivity in sulfur hydride. However the particular compound responsible 
for the high T. is not obvious. The superconductivity measured in the 


Figure 4 | Magnetization measurements. a, Temperature dependence of 
the magnetization of sulfur hydride at a pressure of 155 GPa in zero-field 
cooled (ZFC) and 20 Oe field cooled (FC) modes (black circles). The onset 
temperature is Tonset = 203(1) K. For comparison, the superconducting step 
obtained for sulfur hydride from electrical measurements at 145 GPa is shown 
by red circles. Resistivity data (Tonset = 195 K) were scaled and moved 
vertically to compare with the magnetization data. Inset, optical micrograph 
of a sulfur hydride sample at 155 GPa in a CaSO, gasket (scale bar 100 jum). 
The high Tonset = 203 K measured from the susceptibility can be explained by 
a significant input to the signal from the periphery of the sample which 
expanded beyond the culet where pressure is smaller than in the culet centre 
(T. increases with decreasing pressure (Fig. 2b)). b, Non-magnetic diamond 
anvil cell (DAC) of diameter 8.8 mm. c, Magnetization measurements 
M(H) of sulfur hydride at a pressure of 155 GPa at different temperatures 
(given as curve labels). The magnetization curves show hysteresis, indicating a 
type II superconductor. The magnetization curves are however distorted 

by obvious paramagnetic input (which is also observed in other 
superconductors”’). In our case, the paramagnetic signal is probably from 
the DAC, but further study of the origin of this input is required. The 
paramagnetic background increases when temperature is decreased. The 
minima of the magnetization curves (~35 mT) are the result of the 
diamagnetic input from superconductivity and the paramagnetic 
background. The first critical field H.. ~ 30 mT can be roughly estimated as 
the point where magnetization deviates from linear behaviour. At higher 
fields, magnetization increases due to the penetration of magnetic vortexes. 
As the sign of the field change reverses, the magnetic flux in the 

Shubnikov phase remains trapped and therefore the back run (that is, with 
decreasing field) is irreversible—the returning branch of the magnetic cycle 
(shown by filled points) runs above the direct one. Hysteretic behaviour 

of the magnetization becomes more clearly visible as the temperature 
decreases. d, At high temperatures T > 200 K, the magnetization decreases 
sharply. e, Extrapolation of the pronounced minima at the magnetization 
curves to higher temperatures gives the onset of superconductivity at 

T = 203.5 K. 
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low-temperature runs (Fig. 1) possibly relates to H,S, as it is generally 
consistent with calculations’® for H,S: both the value of T. ~ 80 K and 
its pressure behaviour. However superconductivity with T. ~ 200 K 
(Fig. 2) does not follow from these calculations. We suppose that it 
relates to the decomposition of HS, as high temperatures are required 
to reach the high T, (Fig. 2b). Precipitation of elemental sulfur on 
decomposition could be expected (which is well known at low pres- 
sures of P< 100 GPa; ref. 14); however the superconducting transition 
in elemental sulfur occurs at significantly lower temperatures (Fig. 1b). 
Another expected product of decomposition of H,S is hydrogen. 
However, the strong characteristic vibrational stretching mode from 
the Hz molecule was never observed in our Raman spectra (nor was it 
observed in ref. 14). Therefore we suppose that the dissociation of H2S 
is different and involves the creation of higher hydrides, such as 3H2S 
—H,S + 2S or 2H,S— H,S + S. It is natural to expect these reactions, 
as sulfur can be not only divalent, but also exhibits higher valencies. In 
fact, calculations’® indirectly support this hypothesis, as the dissoci- 
ation H,S —> H, + S was shown to be energetically very unfavourable. 
We found further theoretical support in ref. 20. In that work, the van 
der Waals compound”! (H,S),H, was considered, and it was shown 
that at pressures above 180 GPa it forms an Im-3m structure with H3S 
stoichiometry. The predicted T. ~ 190 K and its pressure dependences 
are close to our experimental values (Fig. 2c). Our hypothesis of the 
transformation of H2S to higher hydrides (in the H3S stoichiometry 
each S atom is surrounded by 6 hydrogen atoms) is strongly supported 
by further calculations”. All the numerous works based on the Im- 
3m structure**~’ are consistent in their prediction of T, >~200 K, 
which decreases with pressure. The hydrogen sublattice gives the main 
contribution to superconductivity®””®. Inclusion of zero point vibra- 
tions and anharmonicity in the calculations” corrected the calculated 
T, to ~190 K, and the isotope coefficient from « = 0.5 to « = 0.35— 
both in agreement with the present work. 

The highest T. of 203 K that we report here has been achieved most 
probably in H;S having the Im-3m structure. It is a good metal; inter- 
estingly, there is also strong covalent bonding between H and S atoms in 
this compound”. This is in agreement with the general assumption (see 
for instance ref. 28) that a metal with high T- should have strong 
covalent bonding (as is realized in MgB,; ref. 29) together with high- 
frequency modes in the phonon spectrum. This particular combination 
of bonding type and phonon spectrum would probably provide a good 
criterion when searching for the materials with high T. at ambient 
pressure that are required for applications. There are many hydro- 
gen-containing materials with strong covalent bonding (such as organ- 
ics) but typically they are insulators. In principle, they could be tuned to 
a metallic state by doping or gating. Modern methods of structure 
prediction could facilitate exploration for the desired materials. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Experimental procedure. For electrical measurements we used diamond anvil 
cells (DACs) with anvils of the following shape: tip diameter of 200-300 pm 
bevelled at 7-8° to a culet of 40-80 jum. An insulating gasket is required to separate 
the metallic gasket from the electrodes. It was prepared in the following way 
(Extended Data Fig. 3). First a metallic gasket of T301 stainless steel (or Re) 250 
um thick was indented with about 17-20 GPa pressure. Then the bottom of the 
imprint of diameter ~200 j1m was drilled out, and a powder insulating material 
was put in the imprint and pressed between the anvils to form a layer. The 
insulating layer was made of either Teflon, NaCl or CaSO, as these materials 
do not react with H2S. The layer was pressed to obtain a thickness in the centre 
of ~3-5 um to provide stable clamping. A larger thickness leads to instability in 
the sample—it shifts or escapes under pressure—while with a thinner gasket it is 
difficult to reach high pressures. A hole of diameter ~ 10-30 1m was then drilled in 
the insulating layer. Four Ti electrodes were sputtered on the diamond anvil. The 
electrodes were capped with Au to prevent oxidation of the Ti. (To checka possible 
contribution of the diamond surface to the conductivity, we prepared a different 
configuration of electrodes for a once-only experiment: two electrodes were sput- 
tered on one anvil and another two on another anvil, similar to ref. 13). After 
preparation of the electrodes the gasket was put back on the anvil and the DAC was 
assembled so that the separation between the anvils was about 20-100 jm (mea- 
sured by interference fringes). The DAC was placed into a cryostat and cooled 
down to ~200 K (within the temperature range of liquid H,S) and then H,S gas 
was put through a capillary into a rim around the diamond anvil where it liquefied 
(Extended Data Fig. 4). H2S of 99.5% and D,S of 97% purity were been used. The 
filling was monitored visually (Extended Data Figs 4, 5) and the sample was 
identified by measuring Raman spectra. Then liquid H2S was clamped in the 
gasket hole by pushing the piston of the DAC with the aid of screws outside 
the cryostat. The thickness of the sample can be estimated to be few micrometres, 
as measured from interference spectra through the clamped transparent sample. 
The thickness might be ~1 tm if the sample expanded over the culet (Fig. 4). After 
the clamping, the DAC was heated to ~220 K to evaporate the rest of the H,S, and 
then the pressure was further increased at this temperature. The pressure 
remained stable during the cooling within +5 GPa. The pressure was determined 
by a diamond edge scale at room temperature and low temperatures”. For optical 
measurements a Raman spectrometer was equipped with a nitrogen-cooled CCD 
and notch filters. The 632.8 nm line of a He-Ne laser was used to excite the Raman 
spectra and to determine pressure. 

The low temperature loading seems to be required to prepare samples with high 
T.. If H2S was loaded at room temperature in the gas loader, for example, only 
sulfur was detected in Raman and X-ray scattering. Apparently in this route the 
sample decomposes before reaching the required high-pressure phase of H3S. We 
did not explore all (P,T) paths to reach the state with high T.. We found however 
that superconductivity is not observed in sample loaded at ~200 K but heated to 
room temperature at low pressure <~100 GPa. 

The resistance and Raman spectra were measured during the pressurizing using 
the four-probe van der Pauw method (Extended Data Fig. 3) with a current of 
10-10,000 1A. The temperature was reliably determined by using a slow warming 
rate (~1 K min ‘) and allowing the DAC to equilibrate with attached ther- 
mometer. The determined T, was well reproduced in measurements with the 
PPMS6000 (Physical Property Measurement System from Quantum Design) 
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and other set-ups. T, was determined as the point of steepest change of resistance 
from the normal state (Fig. 2b). 

The influence of the magnetic field on superconducting transitions has been 
measured with a non-magnetic DAC (diameter 25 mm) in a PPMS6000 in a 4-300 
K temperature range and fields up to 7 T. 

Magnetic susceptibility measurements were performed in an MPMS (Magnetic 
Property Measurement System) from Quantum Design. For these measurements 
a miniature non-magnetic cell made of Cu:Ti alloy working up to 200 GPa 
was designed (Fig. 4b). Samples of diameter ~50-100 jum and a thickness of a 
few micrometres were prepared to provide a sufficient signal. Magnetic suscept- 
ibility measurements using a high-pressure cell were performed using a back- 
ground subtraction feature of the MPMS software of the SQUID magnetometer 
(Extended Data Fig. 6). 

Results. We present here some important key features of our new high-T; sulfur 
hydride superconductor: 

(1) The new superconductor is of type II. This fact is clearly supported by (i) a 
difference in temperature-dependent ZFC and FC magnetization (Fig. 4a), which 
is due to the Meissner effect (ZFC) and magnetic flux capture when the sample is 
cooled down from its normal state (FC); and (ii) the magnetic hysteresis curves 
(Fig. 4c, d). The magnetic hysteresis curves also have all the features of typical type 
II superconductors with a mixed state between H,, and Hy... 

(2) A typical value of the coherence length €g; in the framework of the Ginzburg— 
Landau theory can be estimated on the basis of the measured upper critical fields 
from conductivity measurements (Fig. 3c). Using the experimental estimation 60 
T < He < 80 T and the relation 


1 h 


fg =s 
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we find limits for the coherence length: 2.3 nm > &¢, > 2.0 nm. We note that this 
relatively short coherence length is of the same order as, for instance, the values for 
superconducting YBa,Cu;O, (1.3 nm) and Nb;Sn (3.5 nm). 

(3) The London penetration depth A; can be estimated from the known relation of 
the lower critical field H., to the upper critical field H., for a type II supercon- 
ductor 


Ha Ink 
Ha 22K? 


in the limit « >> 1 of the Ginzburg-Landau parameter «= om Considering the 
experimental value of the first critical field of 3 x 10 °T (Fig. 4c) and the above- 
mentioned relation 60 T < Ho. < 80 T, we can obtain the following estimate for 
the London penetration depth: 7, ~ 125 nm. 

(4) According to Bean’s model, the magnetic critical current density of the super- 
conductor can be estimated from the distance between the direct and the returning 
branches of the magnetic hysteresis loop at a given magnetic field (Fig. 4c). 
Provided grain radii are about 0.1 um, the intra-grain critical current J. is about 
10’ Acm *. 


32. Eremets, M. |. Megabar high-pressure cells for Raman measurements. J. Raman 
Spectrosc. 34, 515-518 (2003). 

33. Landau, L. D. & Lifshitz, E. M. Electrodynamics of Continuous Media Vol. 8, 1st edn, 
173 (Pergamon, 1960). 
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Extended Data Figure 1 | Raman spectra of sulfur hydride at different 
pressures. a, Spectra of sulfur hydride at increasing pressure at ~230 K. The 
spectra are shifted relative to each other. At 51 GPa there is a phase 
transformation, as follows from disappearance of the characteristic vibron 
peaks in the 2,100-2,500 cm“ ' range. The corresponding spectrum is 
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highlighted as a bold curve. Bold curves at higher pressure (and the temperature 
of the measurement) are shown to follow qualitatively the changes of the 

spectra. The pressure corresponding to the unassigned plots can be determined 
from the Raman spectra of the stressed diamond anvil’. b, Raman spectra of 
sulfur deuteride measured at T ~ 170 K and over the pressure range 1-70 GPa. 
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Extended Data Figure 2 | Temperature dependence of the resistance of 
sulfur hydride at 143 GPa. In this run the sample was clamped in the DAC at T 
~ 200 K, and the pressure then increased to 103 GPa at this temperature; 

the further increase of pressure to 143 GPa was at ~100 K. a, After next cooling 
to ~15 Kand subsequent warming, a superconducting transition with T. ~ 60 
K was observed, then the resistance strongly decreased with increasing 
temperature. After successive cooling and warming (b; only the warming curve 
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is shown) a kink at 185 K appeared, indicating the onset of superconductivity. 
The superconducting transition is very broad: resistance dropped to zero 
only at ~22 K. There are apparent ‘oscillations’ on the slope. Their origin is 
not clear, though they probably reflect inhomogeneity of the sample in the 
transient state before complete annealing. Similar ‘oscillations’ have also been 


observed for other samples (see, for example, figure 3 in the Supplementary 
Information of ref. 9). 
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Extended Data Figure 3 | Electrical measurements. a, Schematic drawing of 
diamond anvils with electrical leads separated from the metallic gasket by an 
insulating layer (shown orange). b, Ti electrodes sputtered on a diamond anvil 
shown in transmitted light. c, Scheme of the van der Pauw measurements: 
current leads are indicated by J, and voltage leads as U. d, Typical 
superconducting step measured in four channels (for different combinations of 


current and voltage leads shown in c). A sum resistance obtained from the van 
der Pauw formula is shown by the green line. Note here that the 
superconducting transition was measured with the un-annealed sample’. After 
warming to room temperature and successive cooling, T. should increase. 

e, Residual resistance measured below the superconducting transition (d). Rmin 
and Pin are averaged over four channels shown by different colours. 
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Extended Data Figure 4 | Loading of HS. Gaseous H,$ is passed through diamond anvil. At T ~ 200 K, the line to the H)S gas cylinder was opened and 
the capillary into a rim around the diamond anvils (upper panel). When the __ the gas condensed. At this moment, the picture changes due to the different 
sample liquefies, in the temperature range 191 K < T < 213K, itis clamped. __ refractive index of H,S. The second anvil with the sputtered electrodes was 
The process of loading is shown on a video (https://vimeo.com/131914556) then pushed forward, and the hole was clamped. The sample changed colour 
and a still is shown here (lower panel). On the video, the camera is looking during the next application of pressure. The red point is from the focused HeNe 
through a hole in the transparent gasket (CaSO,), and shows a view through the _ laser beam. 
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Extended Data Figure 5 | View of D2S sample with electrical leads and insulating transparent gasket shows blue, and the electrodes yellow. The red 
transparent gasket (CaSO,) at different pressures. The D,Sisin the centre of — spot is the focused HeNe laser beam. The sample, which is initially transparent, 
these photographs, which were taken in a cryostat at 220 K with mixed becomes opaque and then reflective as pressure is increased. 


illumination, both transmitted and reflected. Under this illumination, the 
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Extended Data Figure 6 | Magnetic susceptibility measurements with a 
SQUID. A typical sample (Fig. 4) has a disk shape (diameter 50-100 jm and 
thickness of few micrometres). In the superconductive state the magnetic 
moment for this disk is estimated as M(disk) ~ 0.2°°H (ref. 33). For a disk 
of radius r = 40 um (a sample size typical for DACs in the megabar range) 
and H = 2 mT the expected diamagnetic signal, M(disk) is estimated as 

2.6 X 10°” emu. This value is well above the sensitivity of the SQUID which is 
~10 * emu and, therefore, the signal can be detected. A high-pressure DAC 
made of Cu:Ti alloy has its own magnetic background signal (a) which increases 
sharply at low temperatures due to residual paramagnetic impurities. Signal 
from a large superconducting sample (for example, a Bi-2223 superconductor) 
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could still be detected without magnetic background subtraction. However, the 
sulfur hydride sample is not seen (b) unless background has been subtracted 
(c, d). The background signal acquired in the normal state immediately 
above Tonset has been used for subtraction over all the temperature range taking 
into account that the magnetic moment of the DAC is fairly temperature 
independent above 100 K. c, Magnetic measurements for the sample of sulfur 
hydride at different magnetic fields (labels on curves). The data on sulfur 
deuteride (d) are compared with the superconducting transition in resistivity 
measurements (blue curve) which has been scaled to fit the susceptibility data 
(black points). 
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Negative refractive index and acoustic superlens from 
multiple scattering in single negative metamaterials 


Nadége Kaina', Fabrice Lemoult', Mathias Fink' & Geoffroy Lerosey' 


Metamaterials, man-made composite media structured on a scale 
much smaller than a wavelength, offer surprising possibilities for 
engineering the propagation of waves’ *. One of the most interest- 
ing of these is the ability to achieve superlensing—that is, to focus 
or image beyond the diffraction limit’. This originates from the left- 
handed behaviour—the property of refracting waves negatively— 
that is typical of negative index metamaterials* °. Yet reaching this 
goal requires the design of ‘double negative’ metamaterials, which 
act simultaneously on the permittivity and permeability in electro- 
magnetics’’’”, or on the density and compressibility in acoustics; 
this generally implies the use of two different kinds of building 
blocks’*"* or specific particles presenting multiple overlapping 
resonances'*'”, Such a requirement limits the applicability of dou- 
ble negative metamaterials, and has, for example, hampered any 
demonstration of subwavelength focusing using left-handed acous- 
tic metamaterials'*. Here we show that these strict conditions can be 
largely relaxed by relying on media that consist of only one type of 
single resonant unit cell. Specifically, we show with a simple yet 
general semi-analytical model that judiciously breaking the sym- 
metry of a single negative metamaterial is sufficient to turn it into a 
double negative one. We then demonstrate that this occurs solely 
because of multiple scattering of waves off the metamaterial res- 
onant elements, a phenomenon often disregarded in these media 
owing to their subwavelength patterning. We apply our approach to 
acoustics and verify through numerical simulations that it allows 
the realization of negative index acoustic metamaterials based on 
Helmholtz resonators only. Finally, we demonstrate the operation 
of a negative index acoustic superlens, achieving subwavelength 
focusing and imaging with spot width and resolution 7 and 3.5 
times better than the diffraction limit, respectively. Our findings 
have profound implications for the physics of metamaterials, high- 
lighting the role of their subwavelength crystalline structure, and 
hence entering the realm of metamaterial crystals. This widens the 
scope of possibilities for designing composite media with novel 
properties in a much simpler way than has been possible so far. 
Negative index electromagnetic materials, as predicted in ref. 11, 
can be designed by achieving simultaneously negative permittivity ¢ 
and negative permeability y, although such materials remained 
theoretical until metamaterials were proposed’*. Metamaterials are 
subwavelength scaled composite media in which the excitation field 
as well as the response of the medium’s unit cells can be averaged, so 
that they can be described in terms of effective parameters (eg OF [erf); 
which can be negative. It then becomes straightforward to achieve 
negative index metamaterials by combining two building blocks, each 
bringing either negative ¢ or negative 1 in electromagnetism or equiva- 
lently negative density p or negative compressibility x in acoustics, 
hence potentially allowing superlensing’. Another approach to the 
design of negative index materials is possible, in which two overlap- 
ping resonances in the same frequency range are used, one being a 
monopolar resonance and the other one being dipolar. These reso- 
nances can originate from the same element!*, a Mie scatterer for 
instance. But they can also arise from elements composed of two 


building blocks strongly coupled via inductive or capacitive effects, 
and whose symmetry has been broken’*”’. This approach is, however, 
not universal since these strong near-field couplings apply only to 
specific geometries and materials. On the other hand, photonic/ 
phononic crystals can also be left-handed, this time only because of 
multiple scattering in a periodic structure that leads to a band folding, 
thus offering negative refraction*’””. Those crystals based on Bragg 
interferences present, however, the major drawback of having a spatial 
period that is comparable to the wavelength, which makes subwave- 
length resolution difficult to achieve. We now show that, as in crystals, 
multiple scattering has profound consequences in resonant metama- 
terials despite their deeply subwavelength scale. This can drastically 
simplify the design of negative index metamaterials. 

For the sake of simplicity, we start our study with the simplest 
possible metamaterial, that is, a one-dimensional chain of resonators 
organized periodically on a deeply subwavelength scale a (Fig. 1a). 
The building block off the unit cell is modelled by a resonant point 
scatterer, with a resonant frequency fo and a linewidth I” (see 
Methods). This approach is hence valid for any type of wave (acoustic 
or electromagnetic) and any type of resonator. The dispersion relation 
of such a chain, with a= 9/12, is calculated analytically using a 
combination of a Green’s function formalism and a transfer matrix 
approach”* (see Methods). This analytical approach, which includes 
multiple scattering, is very general and applies so long as strong 
coupling between resonant elements can be neglected, which is valid 
for many resonators whatever the spacing, and for any resonator if a 
minimum separation distance is maintained”*. The obtained disper- 
sion is well-known to be polaritonic (Fig. 1a), or equivalently, this 
medium can be described by a set of two effective parameters (Ee¢5 Mert 
in electromagnetism or Pers Yer in acoustics) with only one of them 
being negative: this is a so-called single negative metamaterial. 

Starting from this unit cell that creates the single negative property, 
we build two new configurations just by breaking the symmetry in two 
different ways: either (1) we build a bi-periodic chain by off-centring 
one resonator out of two or (2) we create a bi-disperse chain by slightly 
shifting the resonance frequency of every other resonator. For both 
configurations, we analytically calculate the dispersion relation and 
extract the corresponding effective refractive index (Fig. 1b, c). Both 
of those new metamaterials now exhibit, in the bandgap of the single 
negative medium, a new band of propagating waves that presents a 
negative index. To confirm the existence of this negative band, we 
evaluated, for each configuration, the effective parameters retrieved 
from the transmission and reflection coefficients of a single unit cell 
and found a negative index of refraction for both (see Methods and 
Extended Data Fig. 1). 

To understand the origin of this negative band, we perform a 
parametric study of these symmetry-broken formal metamaterials. 
For the bi-periodic chain, we vary the shift in position, while for the 
bi-disperse one we vary the frequency detuning between resonators. 
For each parameter, we analytically evaluate the dispersion relation 
with our approach that takes multiple scattering into account and 
extract the effective index of the metamaterials from it. We display 
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Figure 1 | Analytical study of one-dimensional chains of resonant point 
scatterers. a, Top, schematic view of the regular one-dimensional chain of 
resonant scatterers (green spheres) resonating at fo, with fo arbitrarily set to 
0.15 GHz. The chain consists of building blocks of period 19/12 (dotted black 
lines) each containing one resonant scatterer. Middle panel, the corresponding 
dispersion relation, polaritonic as expected. Bottom panel, the real part of 
the index. Also shown are free space properties (dashed red lines). Frequencies 
are normalized by the resonance frequency fo of the scatterers while the 
wavevectors are limited to the first Brillouin zone. The shaded areas indicate 


those results in a colour-coded representation (Fig. 2). The bi-periodic 
chain presents a negative index band, independently of the spacing, 
which is symmetric with respect to the shift in position. In the 
bi-disperse case, however, the existence of this negative band depends 
strongly on the frequency detuning, suggesting that multiple scattering 
may be involved. Relying on this intuition, we also extract, for the same 
set of parameters, the effective index in the same way as is frequently 


Unit cell Multiple scattering 


the bandgaps, where the wavevector and index turn imaginary. The same study 
is conducted for the bi-periodic chain (b) with two identical scatterers separated 
by a distance d (dashed orange lines) per building block (dashed red lines) 

of period A = A)/6. The distance d is arbitrarily set to 0.8193A. Inset, an 
enlargement of the negative band. In the case of the bi-disperse chain (c) with 
two scatterers having different resonance frequencies fy and f, (represented 
as spheres with slightly different radii) per unit cell of period A = A)/6, the 
same behaviour is found. The curves are plotted for f, = 1.0122f. Inset, 
enlargement of the negative band. 


done in the field of metamaterials, namely with an independent scat- 
tering approximation (ISA). This time we never retrieve any negative 
index, but rather two positive index bands typical of a double 
polariton. This clearly shows that the negative index arises from mul- 
tiple scattering between the resonators of the unit cell, even if the 
distance is far below the wavelength. This, in turn, explains why, for 
the bi-disperse chain, the existence of the negative index depends 
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Figure 2 | Importance of multiple scattering within the building block. 

a, Building block of the bi-periodic one-dimensional chain with the two 
characteristic dimensions d and A = 2a (‘unit cell’). Panels under ‘multiple 
scattering’ show the index of the bi-periodic chain as a function of the 
normalized frequency when scanning the distance d from 0 to A, confirming 
the arising of a negative band (blue) that is logically symmetric across the 
distance a, corresponding to the regular chain. The black areas indicate 

the bandgaps while the reddish ones represent the first and third positive index 
bands of the dispersion relation. Left panel, wide scan; right panel, enlargement 
of the frequency range where the negative band occurs. Panel under ‘ISA’ 
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shows the index for the same chain calculated using the independent scattering 
approximation (ISA) leading to only positive index bands, uniformly, whatever 
the distance d. b, As for a but for the bi-disperse chain while varying the 
detuning frequency f, of the second scatterer, keeping fo fixed. Contrarily to the 
bi-periodic chain, however, the second band is of negative index solely 
within a certain range of detuning close to fy and turns positive for a too large 
detuning f,/fo. Once again, the independent scattering approximation (right) 
leads to only positive index bands. Note that the cases of d = a and fo = f; are 
mathematical band folding artefacts. The colour scales represent the real 

part of the index of refraction (dimensionless). 
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strongly on the chosen detuning: for too large a resonance frequency 
mismatch, the two resonators cannot couple any longer owing to 
multiple scattering. 

To grasp the physics of the approach, we carefully studied the fields 
created by a dimer (the new unit cell of the symmetry-broken media) 
and made the following observation: multiple scattering creates a 
dipolar resonance (the two resonators are out-of-phase) overlapping 
with a monopolar resonance of the dimer. This dipolar resonance is 
responsible for the opening of a narrow transparency window within 
the large out-of-phase response of the monopolar resonance of the two 
resonators constituting the dimer. This is analogous to electromag- 
netic induced transparency in quantum physics”, or more precisely to 
its metamaterial equivalents”. We stress that this dipolar resonance 
results from multiple scattering, and that the conventional homogen- 
ization procedure based on the independent scattering approximation 
cannot retrieve it. Moving from the unit cell to an infinite medium, this 
dipolar mode gives rise to a band of propagating waves within the 
bandgap of the single negative medium, the latter being a consequence 
of the monopolar resonance of the unit cell. This band has a negative 
slope, or equivalently, the metamaterial now presents a negative index. 
This originates physically from the fact that owing to the symmetry 
breaking, the lower polaritonic band folds in the first Brillouin zone, 
analogous to optical branches in diatomic crystals or band folding in 
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photonic or phononic crystals. Here, though, the band folding, owing 
to the change of sign of the Bloch mode between the two edges of the 
unit cell, has a different origin in those metamaterials compared to 
Bragg interference based crystals. Indeed, while in the latter it arises 
from the fact that the host medium wavelength becomes smaller than 
twice the lattice constant, in the former the change of sign results from 
the dipolar nature of the resonant mode within the unit cell. This 
implies that, contrary to negative refraction in crystals, this new 
phenomenon exists even if the scale of the metamaterial is deeply 
subwavelength, and happens at the same frequency as the resonance 
of the original building block. As a consequence, while negative refrac- 
tion in photonic/phononic crystals is inherently diffraction limited, 
our concept permits subdiffraction resolution, as we will show later on. 
Furthermore, since the original single negative effective property does 
not rely on spatial order”, this negative index band should be robust 
even in a metamaterial constituted of randomly placed dimers. Indeed, 
both the monopolar and the multiple-scattering-induced dipolar 
resonances should remain, hence leading to a negative index med- 
ium’*. We finally note that the two symmetry-broken metamaterials 
are strictly equivalent under a transformation approach**”’. 

Since our idea is very general, we apply it to acoustics, designing a 
two-dimensional acoustic metamaterial whose unit cell is made of soda 
(soft drink) cans, that is, acoustic Helmholtz resonators”*”. First, to 
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Figure 3 | Numerical simulations of soda-can two-dimensional regular and 
symmetry-broken media. a, Main panel, dispersion relation in the [M 
direction (dotted black lines) of a regular triangular array along with free 
space dispersion (dashed blue line); top panel, schematic of the medium and 
building block with one can per unit cell (shaded green). b, As a but for a 
corresponding symmetry-broken medium, that is, the bi-periodic honeycomb 
array but with the building block now consisting of two cans. This medium can 
be decomposed on two triangular lattices with cans slightly off-centred. The 
negative index band in the dispersion relation is plotted in red (left panel). Inset, 
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enlargement of the negative band in the I'M (red) and IK (green) directions. 
In addition, a surface plot of the negative band over the whole Brillouin zone 
is shown (right panel). c, d, As for a, b but for the regular square lattice and 
the corresponding symmetry-broken medium, that is, the bi-disperse square 
lattice. The building block of the bi-disperse square lattice can be shown 
schematically as a superposition of two regular square lattices: the first with one 
can (f) in the centre, and the second with one quarter of a can displayed in each 
corner (f;). The specific directions of the square Brillouin zone are I'M (red) 
and IX (green). 
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realize the bi-periodic two-dimensional metamaterial, we use the so- 
called honeycomb lattice, which exhibits this double periodicity in any 
of the I'M directions. This ‘crystal’ is made of a diamond-like unit cell 
consisting of two resonators, and is compared to the triangular lattice 
which has the same unit cell but with only one resonator. Numerical 
simulations using Comsol Multiphysics gives the dispersion of both the 
regular and the symmetry-broken lattices (Fig. 3a, b). The triangular 
lattice medium presents a polaritonic dispersion relation, or equiva- 
lently it can be modelled as a single negative medium, while the hon- 
eycomb lattice (which actually consists of the superposition of two 
identical triangular lattice crystals) displays a negative band. The dis- 
persion depends slightly on the propagation direction but remains 
rather isotropic as shown in the surface plot (Fig. 3b) and can thus 
be described with an isotropic negative effective index of refraction. For 
the bi-disperse two-dimensional lattice, we mix two square lattices of 
slightly detuned Helmholtz resonators in order to build a new square 
lattice whose unit cell contains two resonators (Fig. 3c, d). This 
bi-disperse resonant crystal exhibits a negative branch although it is 
simply the superposition of two almost identical single negative media. 
In this case, the propagation is less isotropic, since the geometry of a 
square unit cell tends to deform the isofrequency contours near the 
corners of the first Brillouin zone. There is, however, no doubt that one 
can find a more isotropic medium. An effective medium approach 
applied to the specific case of our simulated soda-can unit cells 
(Extended Data Fig. 2) leads to the same results. 

To show the potential of our approach, we experimentally dem- 
onstrate a negative index acoustic metamaterial superlens. We work 
with the bi-periodic medium, here a honeycomb arrangement of soda 
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Figure 4 | Experimental demonstration of subwavelength focusing and 
imaging using a flat acoustic superlens. a, The flat lens, composed of a 
compact hexagonal array of soda cans, is supplied with sound by a loudspeaker 
(speaker) centred close to the surface of the medium. Two microphones (mics) 
mounted on a stage moving in two dimensions record the acoustic pressure 
field less than 1 cm away from the tops of the cans over the whole scanned 
area. Absorbers surround the lens to prevent undesired reflections. b, The real 
part of the pressure field (colour coded in a.u., arbitrary units) at f= 417.5 Hz, 
which is near the edge of the negative band, is displayed on a map. c, Map 

of the pressure intensity field after compensation of losses occurring during 
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cans, each one resonating at fo = 420 Hz (Ap ~ 80 cm). We build a slab 
with 124 cans (Fig. 4a), surrounded by acoustic absorbers to avoid 
reflections of sound off the boundaries of the room. An 8-cm-wide 
loudspeaker located approximately 5 cm away from the input interface 
of the medium is used as the source of sound while microphones 
mounted on a two-dimensional translational stage measure the acous- 
tic field above the lens. We work at f= 417.5 Hz, a frequency at the 
lower edge of the negative band, that is, where the effective negative 
index norm is the highest, in order to get the best possible resolution. 
We observe that the field map displays a cone characteristic of 
negative refraction in dissipative media (Fig. 4b). By compensating 
for the losses that occur during the propagation in the lens (losses that 
are experimentally characterized for the whole frequency range of the 
negative band in Extended Data Fig. 3), we can clearly distinguish 
the path for sound refraction, with a focal spot inside the lens 
(Fig. 4c), in very good agreement with the Snell-Descartes law for a 
metamaterial with an effective index around —3, consistent with our 
analytical and numerical results (see Methods). On the other side of 
the superlens, in the vicinity of the surface, we record the image of the 
source with a Ap/15 full-width at half-maximum. This is much smaller 
than the diffraction limited focus obtained without the lens (black 
curve in Fig. 4d), and even smaller than the width of the source, 
29/5, owing to a hotspot created by the aperture of a single soda can 
(a small hole of 4/15 width). We stress that these focusing results are 
without doubt obtained owing to negative refraction rather than being 
due to a canalization effect that would arise from a flat band, for 
instance. This is further confirmed by simulations of negatively 
refracted Gaussian beams impinging on a larger soda-can slab, by 
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the propagation within the lens; the directions of the refracted waves are plotted 
in white dashed lines, clearly displaying the features of negative refraction. 

d, The normalized amplitude of the field in the close vicinity of the output 
surface proves the existence of a focusing area of 49/15 (red line) while 

the source (blue line) is 29/5 wide and the control experiment (black line), that 
is, without the lens, is A9/1.2 wide. e-g, As b-d but for two sources that play 
sound out of phase to demonstrate super-resolution. It clearly demonstrates 
the same negative refraction results with a resolution of /9/7. The dashed 

red (blue) line in f represents the direction of the refracted waves coming 
from the first (second) source. 
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effective medium simulations and by experimental observations of 
frequency dependent foci positions (see Methods and Extended 
Data Figs 4-8). 

Super-resolution can also be demonstrated by being able to discrim- 
inate two sources separated by less than half a wavelength. Two loud- 
speakers, emitting out-of-phase, were placed near the input surface, 
separated by 13 cm (/)/7). The measured pressure field, as well as the 
loss-compensated field maps (Fig. 4e, f), show that the slab produces 
two distinguishable foci inside the superlens. In the focal plane, the two 
images are efficiently separated, thereby demonstrating a /o/7 imaging 
resolution, far beyond the diffraction limit, contrary to the control 
experiment realized without the superlens (black curve in Fig. 4g). 
We have further verified that the two sources can be distinguished 
whatever the phase shift between them (Extended Data Fig. 9). 

We have demonstrated that it is fairly easy to build double negative 
media from single negative ones. By breaking the symmetry of the unit 
cell of a single negative medium (either by changing the spacing or by 
frequency detuning), multiple scattering of waves guarantees the exist- 
ence of an overlap between a dipolar resonance and a monopolar one. 
This eventually results in a negative effective index of refraction when 
considering an infinite medium. This approach is very general as long 
as near-field coupling between resonators can be neglected, and brings 
a new paradigm to the physics of metamaterials since multiple scatter- 
ing is often neglected owing to the subwavelength spatial scale of those 
media. We emphasize that such a negative effective index of refraction 
should be insensitive to the random positioning of dimers, and that it 
should be easily transposable to three-dimensional metamaterials. Our 
results open the way towards metamaterial crystals. 

Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Analytical calculations. To calculate the dispersion relation of an infinite periodic 
and lossless medium of lattice constant a, one only needs to know the transmission 
coefficient T() through a single unit cell**. Indeed, the Bloch wavenumber k is the 
solution of the equation: 


cos(ka) =Re (=) 


Evaluating T(«) is usually performed by the use of simulations, but here we 
propose to use an analytic formulation by using resonant point scatterers*’. In 
this context, a resonator is described by a so-called t-matrix t(@). The frequency 
dependence of t(@) has to satisfy the optical theorem (which is equivalent to 
energy conservation), and for a resonant point scatterer in one dimension it takes 
the form: 

t(@)= ox i 

C¢ M—a-iF 

where ( is the resonating pulsation of the point scatterer, and J” is the resonance 
linewidth. If we consider that this point scatterer, situated at the coordinate point 
X = Xo, is excited by a normalized incoming plane wave, we can calculate the field 
at any position with the formula: 


W(x,e) =e + Go(x—x9)t(w)e!™ 


where Go(x) is the one-dimensional Green’s function. As a consequence, the 
transmission coefficient through a single point scatterer in a one-dimensional 
space is: 


T(m)=1+ ise t(@) 
2@ 


To calculate the transmission coefficient for the case of unit cells containing N 
different resonators, we need to invoke multiple scattering theory’'. Considering N 
resonators (the resonator labelled « is located at coordinate x, and is described 
with the t-matrix t,(@)), we can calculate the field at any position in response to a 
normalized plane wave excitation with: 


N WN 
W(x,0) =e + SOS 1 Go(x— x1) Wap (oe 


4=1f=1 


Here we have introduced the matrix W(a), the inverse elements of which are: 


(Wip(@))—1 = Sup (t,(@))~! = (1 — xp) Go (Xx —xp) 


Solving the multiple scattering problem for a two-resonator unit cell in order to get 
the transmission coefficient T(@) is thus only a problem of inverting the 2 x 2 
matrix and this is what we performed frequency by frequency in order to build the 
dispersion relations shown in Fig. 2. 

The multiple scattering theory is well-known in the field of random media, 
where numerous scatterers are considered. Inverting the matrix can become 
difficult and it is sometimes useful to neglect some terms in the multiply 
scattered waves in order to describe the coherent wave that travels inside the 
medium. One approach, namely the independent scattering approximation, 
consists of neglecting the multiply scattered terms that see a given scatterer 
at least twice. For the two-resonator unit cell, the independent scattering 
approximation reduces the summation, and the transmission coefficient in this 
approximation is: 
ic 


t. 
2 (0) 2a? 


ic Pa 
Tisa() = 1+ 3 t(@) +4 ty (@) (ef?) 
o 


Choice of parameters for the analytical, numerical and experimental results. 
For the semi-analytical study, the distance between the point scatterers was set to 
2/12, where Ag is the wavelength at resonance. This distance corresponds to the 
simulated and experimental typical size of the unit cell on soda-can honeycomb 
lattices, when cans are packed in a compact arrangement. The linewidth of the 
scatterers’ resonance (equivalently the quality factor of the resonator) was taken to 
be of the same order as in the experiment. This allows us to model the refracted 
beams with an index of n = —3 (corresponding to wavevectors at the edge of the 
Brillouin zone of a 2a = 4/6 unit cell); this allows a good fit to the experimental 
data at the bottom band edge frequency f= 417.5 Hz. 

Effective parameters retrieval for media made of resonant point scatterers. In 
order to describe the propagation in media presenting a negative index of refrac- 
tion, acommon approach is to use the effective parameters. In the main text and in 
Fig. 1, we only presented the dispersion relations of the studied media and 
extracted an effective index of refraction. Here, to show the generality of our 


approach, we retrieve the effective parameters with commonly used procedures”. 
At this stage of this Letter, our demonstration is based on the simplest object that 
can be used to build the resonant metamaterial, that is, a resonant point scatterer, 
so that it remains valid for both electromagnetic and acoustic waves. Since the 
experimental proof of concept is, however, realized with the specific case of acous- 
tic waves, we here extract parameters corresponding to acoustics, namely the 
effective compressibility , the effective mass density p and the corresponding 
effective phase velocity c, all normalized by the parameters of the host medium, 
which in this case is air. The exact same solutions can be, however, found for 
electromagnetic waves with effective relative permittivity ¢, effective relative per- 
meability ju and effective index of refraction. 

To evaluate those effective parameters, we first calculate the resonant transmis- 
sion coefficient T(q@) and the resonant reflection coefficient R(@) in the studied 
one-dimensional chain configuration for the three media presented in Fig. 2; 
a periodic medium made of a single resonator, the so-called bi-periodic medium 
and finally the bi-disperse medium consisting of slightly detuned resonators. In 
these three cases the non-radiative losses of each point scatterer are neglected. 
From those coefficients, we use the method presented in ref. 32 that has the 
advantage of avoiding the uncertainty in the sign of the solutions. The retrieved 
effective coefficients are presented in Extended Data Fig. 1. This unambiguously 
demonstrates that the negative bands in Fig. 1 are due to the double negativity of 
the two effective parameters describing the medium. Indeed, the two shaded areas 
in Extended Data Fig. 1a and b correspond to the part of the spectrum where the 
effective phase velocity, the effective compressibility and the effective mass density 
are all simultaneously negative. Nevertheless, such effective parameters, even if 
commonly used in the community, present some weird behaviours that are due to 
periodic effects. In the case of the single negative medium for instance, the effective 
density displays a dip which is often referred to as an anti-resonance and which we 
do not think is really relevant. 
3D simulations. Description. All the simulations are performed using the eigen- 
solver of the finite element simulation software Comsol Multiphysics. The unit cell 
of each medium (highlighted in green in Fig. 3) is simulated by applying Bloch 
periodic boundary conditions to retrieve the dispersion relation. The soda cans are 
modelled with rigid-walled cylinders of 66-mm diameter. The simulated soda can 
has a resonance frequency around 418 Hz, slightly below the experimental res- 
onance of 420 Hz due to a small difference in the bottom of their geometry. To 
create a resonator with a small detuning in the case of the bi-disperse medium, we 
reduce the height of the simulated can by 3.5 mm, which is equivalent to filling a 
can with 1 cl of water*®. The distance a is set to the diameter of the cans (66 mm, 
that is, approximately 4)/12). The simulated triangular and honeycomb crystals 
have a unit cell with a diamond shape with diagonals a and 3a. The simulated 
square unit cells in the case of the bi-disperse medium have a diagonal of A = 2a. 
Effective parameters retrieval. We then apply the same procedure to retrieve the 
effective parameters of the acoustic metamaterials made of soda cans presented in 
the main text (Fig. 3). We perform three-dimensional simulations using Comsol 
Multiphysics to get the transmission coefficients through a single unit cell of the 
metamaterials in the three different configurations. The reference metamaterial, 
which behaves as a single negative medium, consists of a triangular arrangement of 
identical soda cans with a lattice constant of 3a/2 (Fig. 3a). We evaluate the plane 
wave transmission through one layer of soda cans and apply the method of ref. 32 
to retrieve the effective parameters describing the medium. The results displayed 
in Extended Data Fig. 2 shows that such a medium behaves as a single negative 
medium, since only the effective compressibility falls below zero. When some 
losses comparable to the experimental ones are added to the simulation, we see 
that this metamaterial keeps its single negative property. We then simulated the 
transmission through the honeycomb arrangement of cans (the two-dimensional 
equivalent of the bi-periodic chain). In this case, the simulation consists of mea- 
suring the transmission and reflection on two cans separated by a distance of a/2 
along the propagation direction and \/3a/2 in the transverse direction. To retrieve 
the effective parameters the lattice constant is 3a/2, which again is a signature 
of the double periodicity of the medium along the propagation direction. The 
effective index of refraction extracted from the transmission and reflection now 
displays a negative band that is associated with the double negativity, since 
both the effective compressibility and the effective mass density are negative in 
this frequency range. When adding the value of the experimental losses to the 
simulated cans, this negative band remains. Surprisingly, the losses broaden the 
negative band, which is in agreement with what we experimentally measured. 
We further observe that the imaginary part of the effective index of refraction 
reaches a value of approximately 1.5 in the negative band, which is not critical 
for observing transmission through a slab, as we showed experimentally. These 
effective parameters confirm the results from the band diagram and surface plot 
within the first Brillouin zone that we presented in Fig. 3. 
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Nevertheless, we chose not to present these results in the main text since we do 

not think that the description with effective parameters is the most relevant one. 
The main point of the main text is that breaking the symmetry of a single negative 
medium leads to the existence of a negative band. Even if the transmission through 
the two layers of cans evidences this band, most of the physics is hidden inside this 
transmission coefficient since it is the result of multiple scattering at the micro- 
scopic level within the unit cell. In the metamaterial community, it is commonly 
admitted that all scatterers see the same incident wave field, since they are arranged 
on a subwavelength scale, and thus the transmission is only a superposition of 
many resonators, all in phase. This transmission through two cans, which permits 
the negative band, clearly highlights that there is actually multiple scattering 
occurring at the microscopic scale of the medium, and that the two resonators 
do not see the same incident wave field. 
Beam transmission through a slab demonstrates the negative refraction. In order to 
show that a band having a negative slope as shown in Fig. 3 is actually the signature 
of a medium exhibiting negative refraction, and that the super-resolution is not a 
canalization effect****, we perform simulations on a slab geometry. We did not 
implement it experimentally because it would have required a much larger sample 
than the dimensions of our room. To avoid a too high number of mesh cells, those 
simulations only consist of simulating a slab layer of height /3a, which corre- 
sponds to the vertical unit cell of our honeycomb arrangement of soda cans. 
Periodic boundary conditions are then applied on the vertical boundaries to 
simulate the infinitely extended slab. This slab unit cell is excited by an incident 
plane wave with an incident angle 0 with respect to the normal of the slab, and the 
phase shift applied on the periodic boundary conditions matches this value. We 
perform a set of 61 simulations with 0 ranging from —90° to 90°. From those 
simulations, we are then able to build an incident wave field impinging on the slab 
that corresponds to a Gaussian beam with an incident angle 0. To do so, we 
perform the complex summation of the wave fields extracted from each simulation 
and multiplied by a Gaussian coefficient exp ( - ote , where og corresponds 
to the angular aperture of the beam and is chosen to be equal to 9°. The complete 
field map on positions that are not within the simulation area is built by the use of 
the Bloch theorem for each incident angle. 

Such a superposition of simulations is realized for distinct frequencies in the 
negative band (Extended Data Figs 4-6). We start by analysing the results corres- 
ponding to the frequency presented in the main text, that is, 417.5 Hz (Extended 
Data Fig. 4). For this frequency, the index of refraction is supposed to be —3 and 
we want to verify that the refraction satisfies this condition. From the simulations 
corresponding to distinct beam incident angles 09, we evaluate the spatial Fourier 
transform of the field inside the crystal and isolate the dominating wavenumber in 
the first Brillouin zone. This permits us to find the Bloch wavenumber as well as 
the effective index of refraction inside the crystal. For the four angles represented 
in Extended Data Fig. 4, a value of —3 was confirmed. The ray tracing super- 
imposed on the beams corresponds to this value, and is in very good agreement 
with the propagation of the beam through the slab as well as with the simulation 
of an effective medium (index of refraction of —3 and an effective density of 
—1.75 kgm *) represented on each subpanel. This simulation first confirms that 
we have negative refraction, since the outgoing beam seems to come from a 
y coordinate that is higher than the coordinate of the incident beam. Second, since 
the same effective index of refraction has been used for the four angles, this 
confirms that the refraction law can be described by an effective index of refraction 
instead of anisotropic parameters. Third, these results do not show any diffraction 
order at the exit of the slab as one would expect from periodic media. This comes 
from the fact that the medium is organized on a scale much smaller than the free- 
space wavelength and therefore all of the diffraction orders are evanescent at the 
exit interface, again confirming the description of refraction with a negative index. 
Nevertheless, these simulations also show that the reflection at the first interface is 
important, a consequence of the impedance mismatch. Some differences can be 
noted between the honeycomb medium and the effective one in the amplitude of 
the reflection, which means that the effective parameters that we extracted from 
the normal incidence on a single unit cell do not completely match the physics of 
the system. Probably more elaborate techniques that simulate the transmission 
with non-normal incidence on the unit cell would permit better effective para- 
meters to be retrieved, but we have not found such sophisticated procedures 
in the literature. This is a reason why we opted in the main text not to describe 
the propagation in terms of the effective parameters, but rather insist on the 
fact that the double periodicity of the honeycomb lattice creates the negative 
refraction effect. 

To give a more complete picture of the honeycomb soda-can metamaterial, we 
run other sets of simulations for different frequencies. In Extended Data Figs 5 and 
6, we show the same kind of results for respective frequencies of 418.5 Hz and 
419.5 Hz. The corresponding effective indices of refraction are respectively 
—2.4 and —1.5, and the effective mass densities are respectively —1.35 kg m * 
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and —0.5 kgm *. Those simulations confirm the negative refraction effect prev- 
iously observed, but some small differences can be noted. For example, for the 
incident angles of 30° and 60° the outgoing beam does not completely match the 
ray tracing. We attribute this effect to the existence of Fabry-Perot resonances 
within the slab’s thickness, and the multiple reflections inside the slab vertically 
shift the outgoing beam. These Fabry-Perot resonances come from the impedance 
mismatch at the two interfaces of the slab, which indicates that the reflection is not 
null. Also, when looking at the map of the 60° incident beam (it is also visible on 
the others), one can see that the metamaterial slab creates backward reflection in 
the direction of the incident beam. Indeed, the intensity map shows cancellations 
and maxima of the field, which are the signature of stationary waves. The reflected 
beam (positive values of y) does not show these stationary effects. To our know- 
ledge, this has not been described in the literature and would merit some deeper 
investigation. 

Acoustic experiments. Experimental setup. The soda cans are placed in a honey- 
comb compact arrangement, with the apertures of all cans oriented in the same 
direction. Some acoustic absorber is first put on the short edges of the slab to 
prevent reflections and the formation of undesired edge states in the slab and 
second on walls surrounding the slab, to avoid reflections from the walls of the 
room. A two-dimensional translational stage with a 1-m range in each direction is 
used to scan the pressure field. A set of two microphones mounted on a perpen- 
dicular arm on the stage are used in order to increase the scanning area. The 
microphones are placed approximately one centimetre above the cans’ apertures 
and are set to record sound with the same gain using an amplifier. An 8-cm-wide 
loudspeaker placed approximately 5 cm away from the slab’s interface, in front of 
the middle can, produces a 2.5-s-long chirp signal, ranging from 200 Hz to 700 Hz. 
The two-sources experiment is performed by linear combination: a second one- 
source experiment is performed with the loudspeaker shifted by a distance 19/7 
(in front of the first neighbouring can) and the maps are eventually obtained 
by summing the results of the two experiments (with a change of sign for the 
second one). The field maps are obtained by performing Fourier transforms of 
the recorded time dependent signals and are represented saturated. The loss- 
compensated maps are obtained by evaluating the field amplitude exponential 
decrease coefficient « on the line y = 0, that is, due to dissipative losses within 
the cans. The field within the slab at each x is then amplified by a factor e**, with 
a = 0.0079 mm! (equivalent to an attenuation length L = 127 mm). 
Extracting the imaginary part of the index of refraction. It is well known that the 
initial proposal of a perfect lens with a negative index of refraction is hampered 
by the presence of losses, which are inherent to the use of resonant unit cells to 
build the metamaterial. Even our Helmholtz resonators, that is, the soda cans, 
which have been chosen because they minimize losses, present some attenuation 
effects and hence cannot be considered as perfect resonators. In order to estim- 
ate those losses within the propagating band with a negative index of refraction, 
we evaluate the attenuation length in the corresponding frequency band. To do 
so, we average the absolute value of the complex field within the medium for 
each x coordinate. We find exponentially decreasing behaviour for each fre- 
quency, and we fit the logarithm of those curves with a linear fit. The slope (a) 
corresponds to the inverse of the attenuation length, and the imaginary part of 
the index of refraction corresponds to n"(@) = (co/a)a(a). 

In the results that are presented in Extended Data Fig. 3, we note that the 
attenuation is relatively important and is maximum for frequencies where the 
index of refraction takes the higher absolute value. These results are in agreement 
with the imaginary part of the effective index of refraction extracted from the 
simulation of the transmission through a unit cell of the metamaterial (Extended 
Data Fig. 2). Given the thickness of our slab in terms of the free-space wavelength, 
these losses are not critical and a sufficient part of the energy is transmitted 
through the slab to create a focus. This attenuation explains why the medium with 
a —1 index of refraction does not give perfect lensing, since the attenuation pre- 
vents the amplification of the evanescent part of the spectrum. The focus remains 
then diffraction limited for this index of refraction. Nevertheless, exploiting the 
high absolute values of the index of refraction at 417.5 Hz, as is done in the main 
text, allows us to observe the subwavelength spots even in the presence of attenu- 
ation, as is the case in our experimental setup. 

Effective index extraction from tracking the focus. In the main text we only display 
the acoustic field maps corresponding to the focus beyond the diffraction limit at 
417.5 Hz (Fig. 4). But the experimental data gives access to a broad frequency 
range. Since the negative branch is strongly dispersive, as shown by the band 
calculation in Fig. 3, the effective index varies. Here we show that tracking the 
position of the focus on the other side of the lens allows the measurement of this 
change in the effective index that characterizes the honeycomb arrangement of 
soda cans. To do so, we renormalize the acoustic intensity measured at different 
frequencies by the maximum of the intensity measured in the area behind the flat 
lens. This enhances the intensity in the focusing area and permits the focused wave 
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field to be observed. The result of such an operation is represented in Extended 
Data Fig. 7 for three different frequencies. The white dashed lines on the figures 
highlight the position of the focus along the depth (x direction). Depending on the 
index of refraction the focus is either stuck to the surface or well-detached, which 
can be easily explained by a ray tracing. For the three different maps we super- 
impose the ray tracing corresponding to an incident angle of 60°. To find the right 
index of refraction within the lens (between the two red dashed lines) we pick the 
value that permits a crossing at the focus to be seen. By doing so, we find respect- 
ively an index of refraction of —3, —1.1 and —1 for the frequencies of 417, 427 and 
445 Hz. The map at 417 Hz deserves one more comment: since the evanescent field 
is really strong, the focus seems to remain at the last layer of the cans while the 
maximum seems to be on top of the last can’s aperture. But owing to the symmetry 
of the field while the losses are compensated within the lens (Fig. 4c) we chose the 
index of refraction that has a crossing inside the lens at half its thickness. The two 
other maps clearly show that the description with a negative effective index of 
refraction is totally relevant to predict the wave field transmitted through the lens, 
even if the wave field inside the experimental lens exhibits some hot spots due to 
the cans’ apertures. Contrary to a perfect lens’, the focus shown on the last map 
where the index of refraction is —1 is diffraction limited. This comes from the 
inherent losses of the medium under study, characterized in a previous paragraph 
of the Methods. 

The same procedure has been applied for all frequencies ranging from 415 to 
450 Hz to track the focus position more systematically. In Extended Data Fig. 8 
the profiles of the focus along the longitudinal direction (x direction in the 
previous maps) for all frequencies are displayed. From those profiles we clearly 
see the existence of two distinct regimes: a first one where the field is dominated 
by the evanescent waves and in which the focus seems to be attached to the lens 
and appeared at the position of the last layer of cans; and a second one in which 
the focus is well detached and it becomes feasible to find an effective index of 
refraction by ray tracing. In this second regime, we notice that the focus is 
moving further while the frequency is increased, meaning that the absolute 
value of the index of refraction is also decreasing. This is in agreement with 


the band diagram presented in Fig. 3 but those experimental results surprisingly 
show that the negative band is broader than expected from the lossless simu- 
lation of the unit cell (Fig. 3b). 

Changing the phase difference between the two sources. In the acoustic experiment 
presented in Fig. 4, we show that the field generated by two sources separated by 
A,/7 and emitting out of phase is reconstructed on the other interface of the 
superlens. This configuration may seem the optimal one since interferences cre- 
ated by the two sources generate by symmetry a zero of the field between them. In 
order to avoid any misinterpretation, we build the field maps obtained for any 
phase difference ranging from 0 to 360° between the two sources. From those maps 
we extract the intensity received in the focal plane. Then we build in Extended Data 
Fig. 9a a map of those intensity profiles as a function of the phase difference 
between the two sources. This map shows that whatever the phase shift the super- 
position of the two focal spots leads to the creation of two well separated foci. For 
the sake of completeness, we also represent in Extended Data Fig. 9b the real part of 
the field measured in the focal plane. The focused field presents the same depend- 
ence as the emitted one: the real part of the field measured at the focus corres- 
ponding to the first source varies from 1 to —1 and back to 1 while the phase 
difference between the emitting sources varies from 0 to 360°. From the map of the 
intensity one can notice that when the two sources are in quadrature the two 
intensities measured on the two focal spots are not strictly the same due to the 
existence of side lobes for a single focus. The two sources remain however sepa- 
rable in any configuration. As claimed in the main text, those results confirm that 
the resolution of the superlens is effectively 19/7. 
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Extended Data Figure 1 | Effective parameters for metamaterials made for the bi-periodic medium (b) and the bi-disperse medium (c), showing the 


from resonant point scatterers. a, Parameters for the single negative medium _double negativity for each medium. f, Frequency; fo, resonant frequency; 
made of a periodic arrangement of identical point scatterers. b,c, Parameters see Methods for definitions of variables shown in the keys. 
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Extended Data Figure 2 | Effective parameters for the simulated soda-can —_ modulus (d, h) for the triangular (left column) and the honeycomb (right 
metamaterials extracted from the transmission and reflection of a single column) arrangements. Thick lines, with loss; thin lines, without loss; blue lines, 
unit cell. a—h, Speed of sound (a, e), impedance (b, f), compressibility (c,g) and __real part; green lines, imaginary part. 
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Extended Data Figure 3 | Experimental estimation of the imaginary part of 
the effective index of refraction within the negative band. 
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Extended Data Figure 4 | Simulated Gaussian beam incident on an infinite 


slab showing negative refraction at a frequency of 417.5 Hz. 


a-d, Simulations for incident angles of 15° (a), 30° (b), 45° (c) and 60° (d); for 


negative effective medium (right). The frequency corresponds to 417.5 Hz, for 
each angle, data are shown for a lossless slab made of soda cans (left) or of 


which the effective index of refraction is —3. Red lines delimit the slab interfaces 


while white arrows evidence the direction of the incident, reflected and 
refracted beams. 
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Extended Data Figure 5 | Simulated Gaussian beam incident on an infinite slab showing negative refraction at a frequency of 418.5 Hz. As Extended Data 
Fig. 4 but for 418.5 Hz, and an effective index of refraction of —2.4. 
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Extended Data Figure 6 | Simulated Gaussian beam incident on an infinite slab showing negative refraction at a frequency of 419.5 Hz. As Extended Data 
Fig. 4 but for 419.5 Hz, and an effective index of refraction of —1.5. 
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Extended Data Figure 7 | Renormalized maps of the measured acoustic frequency the ray tracing is superimposed (red). The red dashed lines 
pressure intensity for three different frequencies within the negative index _ correspond to the slab’s interfaces; the white dashed lines highlight the focus 
of refraction band. a, f = 417 Hz; b, f= 427 Hz; and c, f= 445 Hz. For each _— depth. 
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Extended Data Figure 8 | Longitudinal profiles of the focus for varying frequencies within the negative band (colour key). Top panel: tracked position 
frequencies. Bottom panel: profiles of the field within the lens (red shading), in _ of the foci in the previously described regions for the same frequencies in the 
its near field (yellow shading) and its far field (green shading) for different negative band (colour key). 
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Extended Data Figure 9 | Effect of the phase shift between the two sources _ right) as a function of the phase shift between the sources. b, Profiles of the 
on the intensity measured on the focal plane. a, Map of the intensity of the normalized real part of the field (Re(P)) along the normalized axis y/ for 
field in the focal plane along the normalized axis y//, (colour coded, key at varying phase shifts (colour key) between the two sources. 
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Guiding the folding pathway of DNA origami 


Katherine E. Dunn! }*, Frits Dannenberg'*, Thomas E. Ouldridge**, Marta Kwiatkowska’, 


Andrew J. Turberfield' & Jonathan Bath! 


DNA origami is a robust assembly technique that folds a 
single-stranded DNA template into a target structure by annealing 
it with hundreds of short ‘staple’ strands’ ~. Its guiding design prin- 
ciple is that the target structure is the single most stable configura- 
tion’. The folding transition is cooperative*®”’ and, as in the case of 
proteins, is governed by information encoded in the polymer 
sequence*™'. A typical origami folds primarily into the desired 
shape, but misfolded structures can kinetically trap the system and 
reduce the yield’. Although adjusting assembly conditions”" or fol- 
lowing empirical design rules'*”’ can improve yield, well-folded ori- 
gami often need to be separated from misfolded structures**’**. 
The problem could in principle be avoided if assembly pathway and 
kinetics were fully understood and then rationally optimized. To 
this end, here we present a DNA origami system with the unusual 
property of being able to form a small set of distinguishable and 
well-folded shapes that represent discrete and approximately degen- 
erate energy minima in a vast folding landscape, thus allowing us to 
probe the assembly process. The obtained high yield of well-folded 
origami structures confirms the existence of efficient folding path- 
ways, while the shape distribution provides information about indi- 
vidual trajectories through the folding landscape. We find that, 
similarly to protein folding, the assembly of DNA origami is highly 
cooperative; that reversible bond formation is important in recover- 
ing from transient misfoldings; and that the early formation of long- 
range connections can very effectively enforce particular folds. We 
use these insights to inform the design of the system so as to steer 
assembly towards desired structures. Expanding the rational design 
process to include the assembly pathway should thus enable more 
reproducible synthesis, particularly when targeting more complex 
structures. We anticipate that this expansion will be essential if DNA 
origami is to continue its rapid development’*’”"” and become a 
reliable manufacturing technology”. 

This study is based on a simplified version of the archetypal 
origami tile’ and, in particular, on the distribution of observed folds 
ofa ‘dimer’ variant which contains two copies of the template sequence 
in head-to-tail repeat. The ‘monomer tile (Fig. 1) is created by anneal- 
ing a 2,646-nucleotide (nt) circular template with 90 staples, each 
designed to hybridize to one or more 15- or 16-base domains of 
the template. 76 of the staples mediate interactions between pairs of 
non-contiguous template domains, as follows: 66 U-shaped ‘body’ 
staples form short-range contacts between domains that are relatively 
close in the primary sequence of the template; and 5 pairs of ‘seam’ 
staples form long-range contacts, bridging between positions where 
the template folds back on itself to form a central seam’. Unlike the 
interactions between amino acid residues that stabilize a protein, 
staples mediate interactions between template domains that are highly 
specific: each staple can be considered to bind stably only to comple- 
mentary domains of the template. The designed fold of the monomer 
tile corresponds to an absolute minimum in the free energy landscape. 
This origami folds with high yield to form discrete rectangular tiles 


of approximately 80 nm X 40 nm (Fig. 1c); approximately 80% of tiles 
appear to be well folded. 

The ‘dimer’ template is also circular. It contains two identical copies 
of the monomer joined head-to-tail and can therefore bind two copies 
of each staple (Fig. 2). Each pair of body and seam staples can bind in 
one of two configurations (Fig. 2a) to form either an internal link 
within each copy of the monomer sequence or a pair of cross-links 
between the two copies. The total number of possible domain pairings 
is 2”°~ 10°. Although many of these configurations are sterically 
inaccessible, it is clear that the result of reducing the specificity of staple 
binding is that, as in the case of protein folding, the number of possible 
states of the system is overwhelmingly greater than the number of well- 
folded structures. However, in contrast to proteins (and to conven- 
tional origami structures) there is more than one ‘well-folded’ state 
(Fig. 2): not one but a handful of well-folded states occupy discrete 
energy minima in a vast configurational landscape. Remarkably, when 
the dimer origami is annealed by cooling from 95°C, a small set of 
well-folded shapes are formed with good yield: each consists of a pair 
of rectangular tiles attached on one edge (Fig. 2b, c). The probability of 
finding well-folded structures by random search of configuration 
space is negligible’, therefore efficient folding pathways must exist*?”’. 
As in protein folding, assembly is constrained such that the system is 
highly likely to discover free-energy minima that correspond to well- 
formed final states. 

The dimer origami tile has 22 template routings that correspond to 
well-folded configurations in which all staple binding sites are occu- 
pied and in which the tile is expected to be planar and unstrained. 
These give 6 unique shapes, each with a characteristic offset between 
two linked rectangular components which have essentially the same 
structure as the monomer tile (Fig. 3a—-c and Extended Data Fig. 1). 
These shapes can be grouped into classes according to the contacts 
made by the seam staples: fold m:n has m pairs of seam staples that 
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Figure 1 | The monomer tile. a, 66 body staples (blue) and 5 pairs of seam 
staples (brown) each hybridize to two non-contiguous domains of the circular 
template. Edge staples (grey) fill gaps at the top and bottom of the structure. 
Hybridization of body and seam staples pins the corresponding domains of the 
template together, determining the unique stable, rectangular fold of this 
simplified origami tile as indicated in b. c, Atomic force micrograph of the 
monomer tile (scale bar, 50 nm). 
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Figure 2 | Folding origami tiles with a dimer template. a, Left, the base 
sequence of the green section of the template is the same as that of the pink 
section, so the dimer template can fully hybridize to two copies of each staple. 
Arrows indicating staple binding and unbinding transitions are annotated 
with reaction rates used in a model of assembly. Two identical staples can bind 
to the template in one of two configurations, binding together pairs of domains 
within each section or connecting domains in different sections. This gives a 


connect domains within each half of the template and n pairs of seam 
staples that form connections between domains in opposite halves 
(Fig. 3a—c). Folds m:n and n:m are related by symmetry and are there- 
fore not distinguished in our experiments or analysis (Extended Data 
Fig. 1b). A set of non-planar folds adds a seventh shape to the six 
defined above and a further 52 template routings (Fig. 3d and 
Extended Data Fig. 2). Fink and Ball” have estimated the maximum 
number of distinct, compact configurations that can be encoded into a 
single polymer sequence: for a polymer of 168 unique domain types on 
a square lattice**~* the theoretical limit is 13. A major factor in allowing 
the large number of folds in our system is the extensive re-use of 
structural motifs within distinct folds, a possibility not considered by 
Fink and Ball. 

Atomic force microscopy (AFM) enables us to distinguish different 
configurations of the template and this provides a unique opportunity 
to study folding pathways. Samples of annealed origami were imaged 
by AFM. Most observed shapes are consistent with the classification 
scheme shown in Fig. 3, and the outlines of 44% of objects identified 
as candidate dimer tiles were successfully fitted to measure the offset 
between the two component monomer tiles (Fig. 3e and Extended 
Data Fig. 3). 

The distribution of tile shapes was compared to predictions made 
using a Markov chain model of folding in which each transition corre- 
sponds to binding or unbinding of a single staple domain (Fig. 2, 
Methods section ‘Folding model’). An unbound staple at concentra- 
tion c binds to the template with a rate kc (where k, = 10°M's ', 
ref. 25). After one half of a staple has bound, the second half can bind 
with a rate (k4+Ceg) that depends on its effective concentration, Cor, at 
the corresponding template domain. The effective concentration 
depends on the proximity of the template domain which, in turn, 
depends on the contacts between template domains already estab- 
lished by hybridization of other staples. We expect folding to be domi- 
nated by short-range interactions because staples are more likely to 
connect two template domains that are spatially close, either because 
they are closely spaced along the template or because the previous 
binding of other staples is holding them together. To determine the 
effective concentration, the shortest path through the part-assembled 
origami that connects the complementary template and staple 
domains is identified. This connection is modelled as a heterogeneous 
freely jointed chain with double-stranded (ds) and single-stranded (ss) 


vast number of possible configurations, a handful of which are well-folded 
(shown right, and defined in Fig. 3 and Extended Data Figs 1 and 2). Ordered 
folds of the dimer template comprise two linked rectangular tiles with a 
characteristic offset on the long or short edges, for example, b or c, respectively 
(in each case, an AFM image with a scale bar of 50 nm is shown alongside a 
schematic of the fold). 


DNA components. The effective concentration of the part-bound staple 
at the complementary template domain is related to the probability that 
the ends of the chain lie spontaneously within a (short) interaction range. 
Unbinding of a staple domain is treated as a two-state transition, with a 
configuration-independent rate: k_ =k, exp{AGo"P eX/RT}(1 M) 
where AG®"?'* is the change in standard free energy on forming the 
duplex at standard concentrations of 1 M. In order to represent steric 
constraints on folding, the state space of the model is restricted to pat- 
terns of staple binding in which each segment of the partially folded 
origami occurs in one of a set of pre-defined, well-ordered folds. 

The histograms in Fig. 4 show distributions of offset values, mea- 
sured by fitting AFM data (Extended Data Figs 3 and 4), and the 
corresponding distributions between the discrete shapes shown in 
Fig. 3 that are predicted by the model. Figure 4a corresponds to the 
staple set described above (see Fig. 1): structures with each of the seam 
configurations 5:0, 4:1 and 3:2 are observed. The model suggests that 
the folding pathway depends on competition between body and seam 
staples. If local interactions mediated by body staples were to form first 
and dominate the outcome, the system would prefer the 5:0i fold (see 
Fig. 3f for nomenclature) in which all body staples are bound to two 
domains that are as close as possible along the template. In this fold, no 
staples link the two halves of the template. However, strong seam 
connections that are inserted early in the folding pathway favour a 
more uniform distribution between all possible seam configurations: 
for example, once the part-folded structure 1:1 has formed, the 5:0 fold 
is inaccessible unless at least one seam connection is broken (Extended 
Data Fig. 5). With the staple set shown in Fig. 4a, each seam contact is 
bridged by two staples. The cooperative binding of seam staple pairs 
offsets the increased entropic cost of forming long-range contacts, with 
the result that seam staples are incorporated at a similar temperature to 
body staples in both model (Fig. 4a) and experiment (Extended Data 
Fig. 6). Consequently the model predicts that all seam configurations 
should be observed, consistent with experimental observations. 

We predict that the folding pathway can be changed by altering the 
relative strengths of short- and long-range interactions. Breaking 
in half one of each pair of seam staples (Fig. 4b), so that the pairs no 
longer bind cooperatively, weakens these long-range bonds, causing 
them to form later in the folding pathway (Fig. 4b, central panel and 
Extended Data Fig. 6) and to break and reform in alternative config- 
urations more frequently (Extended Data Fig. 7). With weakened 
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VL = 0.00 0.02 
w/W = 0.05 0.20 


Figure 3 | Classification of well-folded shapes. a-c, Folds of the dimer 
origami tile can be classified by the pattern of interactions mediated by the seam 
staples: these contacts are shown schematically in diagrams in which the dimer 
template is represented as a circle. Fold m:n has m seam contacts between 
domains within each copy of the monomer sequence and n seam contacts that 
form connections between the two copies (that is, connecting template domains 
of different colour). The fold 5:0 can be further divided into shapes that 
differ in the offset along the long edge of the two tiles (5:0i, 5:0ii and so on: c). 
d, The set of legal folds allowed by the model includes the configurations shown 


long-range interactions, we expect folding to be governed primarily by 
local interactions. The model predicts that the distribution of shapes is 
shifted strongly towards the 5:0i fold, in which all body staples span the 
smallest possible distances along the template (the same distances as in 
the monomer tile), and this is confirmed by experiment (Fig. 4b). The 
thermodynamic cost of breaking every other seam staple is approxi- 
mately equal in each well-folded state and therefore this change should 
not affect their equilibrium populations. We have changed the distri- 
bution between folds not by changing the relative stability of the final 
states but by deliberately controlling the stabilities of crucial inter- 
mediate states, thus shaping the folding pathway. 

The importance of stable, long-range interactions in determining 
the folding pathway is revealed by the evolving correlations between 
seam staples in the model. Characteristic patterns of correlation can be 
used to predict the final fold even before seam staple occupancy has 
reached 50% (Extended Data Figs 8 and 9). 

The influence of seam staples on folding is similar to that of disulphide 
bonds in Anfinsen’s experiment on protein folding’. If long-range 
bonds are allowed to form first and, effectively, irreversibly, then 
folding is kinetically trapped. If they are weakened and permitted 
to rearrange then folding can be controlled by weaker short-range 
interactions. 
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in a—c and an additional set of non-planar configurations (NP), one of which is 
shown (see Extended Data Fig. 2 for the complete set). e, Tiles observed in 
folding experiments can be classified according to the fractional offset along the 
short or long edge of the tile (w/W and I/L respectively). f, Seven unique shapes 
corresponding to well-folded configurations (see also Extended Data Fig. 1). 
g, Gallery of shapes observed by AFM in a typical experiment with measured 
fractional offsets (each image is 300 X 300 nm). A bin size of 0.1 is used in 
histograms of fitted fractional offsets (Fig. 4). 


Figure 4c shows an alternative staple set incorporating extended 
staples that form particularly strong short-range connections and 
therefore bind to the template early in the folding process (Fig. 4c, 
central panel). Without interference from other staples, these contacts 
are most likely to form between the pairs of template domains with the 
smallest separation along the template. These preferred contacts occur 
in the 3:2 and 5:0 folds but not the 4:1 fold (in the 4:1 fold, one extended 
staple forms a long-range contact between the two halves of the tem- 
plate). Experimental results confirm the model prediction that the 4:1 
fold is strongly suppressed (Fig. 4c). As with the broken seam staples 
(Fig. 4b), this modification guides the folding pathway without impos- 
ing an energetic penalty on alternative folds. 

We can control the fold of the dimer very effectively by engineering 
both the folding pathway and the stability of the chosen target struc- 
ture. The 3:2 configuration can be favoured by weakening the original 
seams (as in Fig. 4b) and adding new seam staples that bridge between 
the monomer tiles without distortion only in the 3:2 configuration 
(Fig. 4d). This modification guides folding by increasing the stability 
of 3:2 relative to other folds. Similarly, a long staple in the bottom right 
corner of the monomer tile (Fig. 4e) biases folding towards the 5:0iv 
shape by decreasing the stability of other folds, which would require 
introduction of a sharp bend within the long staple. (The model does 
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Figure 4 | Folding can be guided by modifying staples to steer the folding 
pathway. The reference staple set (a) folds to give a distribution of shapes that 
are characterized by the fractional offset between the two component tiles along 
the long or short edge. Modifications to the reference staple set (b-e) were 
designed to fold into specific target shapes. Left-hand panels show the staple 
configurations and the seam-staple contacts in the target structures. The top left 
rectangle of each target shape is used to highlight modified staples in red. 
The distance between the two template domains linked by a staple depends on 
fold: in the bottom right rectangle, staples are grouped and coloured according 
to the distance spanned (see key in a: short-range body staples are blue, 


not include any penalty for bending and so fails to predict the engi- 
neered bias in this case.) 

By showing that an origami tile with a duplicated template can be 
annealed to produce a high yield of well-folded structures from among 
~10* disordered alternative staple configurations, our results confirm 
that, as in the case of proteins, efficient folding pathways exist and that 
folding is highly cooperative. We infer that the folding of all DNA 
origami is shaped by similar pathways. Manipulation of the folding 
pathway validates our simple folding model, which successfully pre- 
dicts the dominant folding pathways observed in experiments. We 
anticipate that this tool will prove more generally useful, to establish 
how to change the relative strengths of local and long-range staple 
interactions to rationally steer the folding pathway towards desired 
target structures. 


Temperature (°C) Short edge Long edge 


Fractional offset 


seam staples are brown to yellow; lighter shades indicate larger distances). 
Graphs in the central panels show the calculated fraction of contacts formed for 
each staple group (whether or not as part of the target structure) as a function of 
temperature during assembly. The right-hand panels show the distribution 
of shapes predicted and observed for each set: a histogram representing the 
continuous distribution of fitted offset values is plotted above the distribution 
between discrete shapes predicted by the model (indicated by silhouettes). The 
number of fitted shapes, N, and the yield (shapes fitted as a percentage of 
candidate structures identified) are shown in the top right corner of each 
histogram. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Experimental methods. Plasmid pUC19 cut with HindIII and EcoRI was amp- 
lified by PCR with the primers TGACCTAATCCTCAGCAATTCACTGGCC 
GTCGTTTTACAA and ACGGACGCGCTGAGGAGCTTGGCGTAATCATG 
GTCATAG in order to trim the template to the desired length and introduce a 
unique BbvCI site. The PCR product was cut with BbvCI and ligated to generate 
pKD1 (2,646 bp). A typical monomer plasmid preparation contains a small 
amount (~1%) of plasmid dimer. The dimer plasmid was obtained by nicking a 
monomer plasmid preparation with Nt.BbvCI (in order to resolve monomer and 
dimer more easily), purifying the nicked dimer band from a 0.7% TAE agarose gel, 
then transforming the purified nicked dimer into the recA host DH5a. The tem- 
plate sequence is given in Supplementary Information. 

Single-stranded template was prepared by sequential reaction of either mono- 
mer or dimer pKD1 with Nt.BspQI at 50 °C and Exolll at 37 °C to digest the non- 
template strand and leave a covalently-closed single-stranded template’®. Enzymes 
were removed by phenol:chloroform extraction and the template was recovered by 
ethanol precipitation; its concentration was then determined by measuring ultra- 
violet absorbance at 260 nm. 

DNA origami was designed using caDNAno” and was assembled by cooling 
template at 4-10 nM with a ~10-fold excess of staples from 95 °C to 25 °C at 1°C 
per minute in a buffer containing 40mM Tris-acetate (pH 8.3) and 12.5mM 
magnesium acetate. Excess staples were removed using an S-300 size exclusion 
spin column”. Staple sequences for the standard design and variations are given in 
Supplementary Information. 

Atomic force microscopy images were acquired using either an Agilent 5500 

AFM with Olympus TR400-PSA probes (Figs 1-3, 4a) ora Veeco Dimension 3100 
with Bruker SNL-10 probes (all other figures). A few microlitres of sample were 
added to freshly cleaved mica and the sample was imaged in tapping mode in an 
imaging buffer containing 12.55mM magnesium acetate, 4mM NiCh, 1mM 
EDTA and 40 mM Tris-acetate pH 8.0-8.3 (the imaging buffer for Fig. 1c lacked 
NiCL, the imaging buffer for Fig. 2c lacked EDTA). 
Folding model. Our domain-level description of origami assembly is intended to 
reproduce some aspects of cooperativity. In particular, it accounts for the increase 
in incorporation rate for a staple when its target domains on the template are held 
more closely together as a result of the earlier binding of other staples. This effect is 
most noticeable in the seam where the binding of the first of a pair of seam staples 
greatly accelerates, and is stabilized by, the binding of the second. The model 
incorporates a physically reasonable approximation of the entropic cost of closing 
loops by staple binding, but is far from a complete description of the physics of 
assembly. It is useful in guiding, and providing insights into, the effects of signifi- 
cant changes to the origami design. 

We model the folding of an isolated template in the presence of an excess of 
staples as an inhomogeneous continuous-time Markov chain. Each transition 
between states corresponds to the binding or unbinding of a single staple domain. 
Transition rates between two states are chosen according to an estimate of the free 
energy difference between the two, in a manner that would reproduce the correct 
Boltzmann distribution if this free energy difference were calculated exactly. The 
temperature is updated once per second of simulated time which allows us to use 
an event-based Gillespie simulation algorithm” with transition rates fixed over 
one second intervals. Data on folding processes are collected by simulating mul- 
tiple folding trajectories (typically 1,600 per experiment). 

Subsequent sections contain more detailed descriptions of the folding model. 
State space. We consider the possible configurations of staples hybridized to the 
template with domain-level resolution: a domain is either fully hybridized or 
unhybridized. A staple is called half-bound if only one of its two domains is 
hybridized to the template and fully bound if both domains are bound. In the 
model, a staple domain can only hybridize to the complementary template 
domain; we ignore weaker interactions that result from inevitable partial sequence 
complementary between other pairs of domains. 

For each type of two-domain staple (and the corresponding two pairs of 
complementary template domains) there are 34 distinct patterns of domain 
binding (states) with between zero and four copies of the staple bound to the 
dimer template. One is an empty state. When one staple is bound to the 
template there are four states in which the staple is half-bound and four states 
in which the staple is fully bound. When two staples are bound to the template 
there are six states in which both staples are half-bound, eight states with one 
half-bound and one fully bound staple, and two states with two fully bound 
staples. There are four states with three half-bound staples and another four 
states with one fully bound and two half-bound staples. Finally there is the 
possibility that four half-bound staples are attached to the template. For a 
single-domain staple and the associated pair of template domains there are 
just four states. There are therefore 34* X 4” states of the dimer template with 
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staples, including part-folded states, where x is the number of two-domain 
staples and y is the number of single-domain staples. Of these, 2* states 
consist exclusively of fully-bound staples. Formally, the state space S is given 
by po X pi X... X pe—1 where p; denotes the set of possible states for staple i as 
described above and k is the total number of staples. 

Exclusion algorithm. Two template domains hybridized to a single two-domain 
staple are held within a few tenths of a nanometre of each other at the staple 
crossover: many of the folds in S cannot meet this constraint. We provide an 
algorithm that provides an approximate representation of steric constraints, pre- 
venting the model from accessing unrealistic states. This method provides an 
approximation to the real steric constraints: it does not guarantee that each legal 
state satisfies the constraints or that all states that satisfy the steric constraints 
are legal. 

We define a connected segment of an origami as a set of hybridized domains 
such that each domain can be reached from each other domain without leaving 
the set. Two template domains hybridized to the same staple are defined to be 
connected, as are two adjacent template domains hybridized to different staples. 
A partially folded segment of origami is considered stress-free (is legal) when it 
occurs in one of the set of well-ordered, two-dimensional folds shown in Extended 
Data Figs 1 or 2. These pre-defined folds satisfy the constraints imposed by finite 
staple length and steric exclusion. 

More formally, we can represent the physical origami in partially folded state 
s © Sasan abstract graph G(s) = (V, E) such that each boundary between adjacent 
domains is a vertex v € V and each template domain and staple crossover is 
an edge e © E between the appropriate vertices. Each edge has a labelling function 
f E= {single-stranded, double-stranded, crossover} that assigns an appropriate 
status. We can draw subgraphs consisting of connected hybridized segments of the 
graph: for the origami to be in a legal (stress free) state, each of these subgraphs 
must be present in a single well-ordered fold from the set shown in Extended Data 
Figs 1 and 2. 

Misfolds occur in the model when at least two connected segments would be 

incapable of satisfying the constraints were they to become connected to each 
other. At that point, folding cannot advance unless one of the segments unfolds, 
allowing another to expand. Extended Data Fig. 3c shows a misfolded dimer that 
has three connected parts that cannot be joined to form a stress-free state. When 
simulating assembly using the staple set corresponding to Fig. 4a, about half of the 
simulations end ina misfolded state; for the weakened-seam variant (Fig. 4b) there 
are only ~1% misfolds. 
Rates model. We develop a kinetic model of folding based on standard reaction 
models for hybridization and a method to estimate the effective local concentra- 
tion of the unhybridized domain of a half-bound staple at its complementary 
template domain. 

Consider complementary strands A and B that can bind reversibly to form 
duplex AB. Under the assumptions of mass action kinetics, the concentration 
[AB] is described by 


d[AB] 
“dt 
for rate constants ky and k_. The rate constants are constrained by the require- 
ment that the equilibrium concentrations {A}, {B} and {AB} are consistent with 


=k. [A][B]—k_[AB} (1) 


AGys""™(T), the standard change in Gibbs free energy on duplex formation: 
{AB} 


= 0,duplex 
1 nee 


where R denotes the molar gas constant, T temperature. 

For staples within a partially folded origami, binding and unbinding rates are 
similarly constrained by the difference in free energy between states. We approx- 
imate the difference in free energy between partially folded states s,s’ that differ by 
the hybridization of a single template domain as 


AG), = hae + AGmre (3) 
where AG is the standard free energy change corresponding to the forma- 
shape 
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tion or dissociation of an equivalent isolated duplex and AG: 1," represents the 
change in entropy corresponding to the geometric constraints on the template that 
arise when two-domain staples connect non-contiguous template domains (‘loop- 
ing constraints’)**°*!, AG*"*P¢ quantifies cooperative effects: when a single staple 
domain binds or unbinds, AG"? depends on the pattern of binding of other 
staples. 

Consider a single, isolated origami in partially folded state soo and let staple 
p bind to the template by a single domain, resulting in state so;. The rate for 
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this reaction is taken to be equal to that for duplex formation between isolated 
strands: 


(so0.501) =k+ [p] (4) 


where a(s,s’) is the rate of transition from state s to s’. The unbinding rate is then 
determined by a thermodynamic constraint analogous to equation (2): 


0 ‘0, duplex 
So0+S01 


AG .50 1M k G 
rr) GM) =k+ exp| a 


We have set AGsePe =0 because transitions 59, < So9 do not create or destroy 
loops in the template. (We do not take into account other ways in which hybrid- 
ization of a single staple domain affects the free energy of the partly-folded ori- 
gami, for example, by changing the mechanical properties and thus the free-energy 
cost of any pre-existing loop of which it forms part.) For the second domain of the 
staple, once the first domain is bound, we again fix the unbinding rate to be that of 
the corresponding isolated duplex. This rate does not depend on the change in 
entropy that results from the removal of a looping constraint*°*! because, imme- 
diately after unbinding, the conformation of the template is unchanged: 


AGO-duplex 
a oxo (6) 


G(S01,S00) =k+ esp( Jor (5) 


0(S11,801) =k+ exp ( 


where s,, denotes the state in which the staple is bound to the template with both 
domains. The binding rates of the second domain of the staple, once the first 
domain is bound, can then be found from the thermodynamic constraint 


—AG,,,,s 
0(So15$11) = 0(S11,S01) EXP (==) 
AGO-4uplex _ AG — AGehape (7) 
“ee on = RT ws] (1M) =k4 on ee (1M) 
shape 


The free energy penalty AG, ;,"", that corresponds to the additional geometric 
constraints associated with the binding of the second staple domain, thus deter- 
mines the binding rate for the second domain. 

Looping constraints. We approximate AGhere = AG? where AG? corre- 
sponds to the entropic penalty of closing the new loop that forms in the template 
when the second domain ofa staple binds. For other transitions, no loop forms and 
we take AG.hape = 0. AGieer quantifies the difference between the entropic pen- 
alties for pinning the template into a loop so that the second staple and template 
domains can bind and for bringing together two domains unconnected by a loop 
in a hypothetical ideal system at standard conditions (1 M concentration)”. AG°°? 
is thus related to the ratio between the probabilities of bringing two domains into 
contact in the looped system and in the ideal unconnected system: 


AGH? = RT In(P8/Pi,p) (8) 


Here, Pisop is the probability that the origami adopts a confirmation in which the 
unbound staple arm and the template domain are spontaneously within an inter- 
action radius rp of each other, where rg is an unspecified small distance necessary 
for closure of the loop. P’2 is the probability that two unconnected molecules 
would be within ro in a hypothetical ideal system of v° = 1/Na litres, Na being 
Avogadro’s number. The rate of hybridization of a second staple domain is there- 
fore given by 


o(so1.511) =k Proop (1M) =k Proop (9) 
01811 + | pro + TaNa 
vw 3 


ro 
i F : F : 
so <P _ | (1M) denotes the effective concentration of the opposing domain. 
i mrgNa 


As a first approximation we treat the loop of DNA as a freely-jointed chain 
comprising two types of link, double-stranded DNA and single-stranded DNA 
(dsDNA and ssDNA respectively). Let P(r) be the probability density for the end- 


to-end extension of the chain r. Then Pinop = f P(r)dr is the probability that the 
0 


two domains are separated by at most ro. 
The end-to-end distance distribution P(r) of a freely-jointed chain, in the limit 
of a large number of segments, is 


P(r) =4nr" (<a) an (Z5) (10) 


where E[r’] is the mean squared distance between the two ends. The result for a 
single segment type is a classic result of statistical physics****. The following 


argument shows that the result also holds for a chain with heterogeneous 
segments. From the central limit theorem, for a large number of segments we 
expect a Gaussian distribution over the x, y and z components of r. Equation (10) 
is the only Gaussian distribution that also satisfies the symmetry conditions 
E[x] = Ely] = Elz] = 0, and E[xy] = E[xz] = E[yz] = 0. 

The internal association rate is therefore given by: 


ry a." _ (37 
Anr? ( ——— —— |d 
jes) (ae) 


(Sor, S11) =k 


tareNy 
a (11) 
7 3 \3/2 
Anr? d 
fan? (sra) 8 ap 3 ye 
ky IBN “N 2 
$mrjNa I, \2nE[r?| 


where we have assumed ry < E[r?]in the second step. 

The loop that is closed by the insertion ofa staple into a part-folded origami has, 
in general, a complex structure comprising multiply connected domains of single- 
and double-stranded DNA. We approximate this loop by a single path through the 
origami, the loop with the smallest expected square end-to-end distance E[7’]. This 
path represents the most important constraint that leads to the enhancement of 
the effective local concentration of one end of the loop at the other, and thus 
provides the most significant enhancement of o (so), 51,). In order to identify 
the dominant loop, each edge e € E in the implied graph G(s) = (V,E) of the 
partially folded origami is assigned a weight equal to the contribution to E[r’] in 
the freely jointed chain approximation. Dijkstra’s shortest path algorithm” is used 
to determine a loop that minimizes E[?’] and hence determines o (503, $11). 

For the seam staples, which are paired, the loop closed by hybridization of the 
second staple is particularly small: it consists only of the crossover link. The 
predictions of the model remain physically sensible: a second staple binding to a 
seam has an overall AG which is ~4.4 kcal mol! less favourable (at T = 60°C) 
than a continuous duplex. This destabilization is equal to that expected from a 5-nt 
bulge within a duplex*®. We note that for the broken seam variant, the model 
predicts incorporation temperatures for the unbroken staple that are lower than 
the regular case by 2.0 °C, compared to 2.2 °C measured in experiment (Extended 
Data Fig. 6). It is therefore clear that we do not overestimate the cooperative 
stabilization of seam staples. 

The approximations made in estimating the change in free energy when a staple 

domain binds or unbinds are not thermodynamically self-consistent: the value 
assigned to the difference in free energy between states depends, in general, on 
the path taken between them. Models of this kind will be presented in a com- 
panion paper, in which they are compared to thermodynamically self-consistent 
approaches for simpler systems (F.D. et al., submitted). 
Parameterization of the model. Compared to unbinding rates, the rate of binding 
of an isolated duplex is known to be weakly dependent on duplex stability**. We 
assume k.,. to be independent of temperature, domain sequence, and folding state, 
and we setk, =10°M '‘s ' (refs 25, 36, 37). 

The free energy change when each domain binds to its complement, A' 
is taken to be that of a 16-bp DNA duplex averaged over all possible sequences”. 
Buffer conditions of 40 mM [Tris] and 12.5mM [Mg*"] are assumed, giving an 
additional entropic penalty (in units of cal mol K~*) for duplex formation of:*~° 


Gotuplex, 


AS? = 0.368 x x x ia( [Tris] + 3.3[Mg** | (12) 


1 

2 
where N is the number of phosphates in the duplex. For ssDNA we use a contour 
length of L,.,=0.6nm per base and a Kuhn length of 4,,= 1.8nm: a single- 
stranded domain of 16 bases thus has a contour length of 16 X 0.6 nm*'™. For 
dsDNA we use a contour length of L.,q, = 0.34 nm per base“ and make the approxi- 
mation that the persistence length is much longer* than any relevant duplex: a 
double-stranded domain of 16 bases thus corresponds toa single rigid link of length 
Ads = 16 X 0.34nm. A crossover link between the two template domains hybridized 
to a single staple is treated as a single segment of length A... 
Example rate calculations. Consider the half-bound staple shown in Extended 
Data Fig. 10a that is hybridized to an otherwise empty template. A seam staple, 
labelled A, is used as an example here. Its second domain can hybridize to either of 
two sites: the closer is connected by a 448-nt ssDNA chain (E [r°] = 480 nm’) and 
the further by a composite chain comprising a 2,208-nt single-stranded chain 
and one rigid 16-bp double stranded segment (E[?7] = 2,400 nm’). Following 
the calculation outlined above, we find that for the closer site the effective 
local concentration of the opposing domain ceg=51 1M, the loop cost 
AGP = 6.5 kcal mol! (at T= 60°C) and the hybridization rate ¢ = 50 st 
For the further site: cog = 4.6 uM, AG? = 8.1 kcal mol 1, and = 4.58 1. The 
staple is 11 times more likely to bind to the closer domain. 


©2015 Macmillan Publishers Limited. All rights reserved 


Binding of one staple affects the binding of others by changing the character- 
istics of the template (or partly-formed origami) that links their two binding 
domains. We now compute the hybridization rate, loop cost and local concentra- 
tion for a second seam staple, staple B, in the presence or absence of staple A. In 
the absence of staple A, the shorter of the two loops that connect two binding 
domains of the second staple consists of a 864-nt ssDNA chain: E[?”] = 980 nm’, 
Cete = 18 LM, AGI°°P = 7.2 kcal mol !, ¢ = 18s. In the presence of staple A, the 
loop passes through the link formed by staple A and comprises 384 nt ssDNA, 3 
rigid 16-bp dsDNA segments and a staple crossover modelled as a single segment 
of length /,, (Extended Data Fig. 10b): for this shortened loop, E[?’] = 520 nm’, 
Copp = 46 11M, AGP = 6.6 kcal mol! and ¢ = 46s +. Insertion of staple A 
increases the rate of hybridization of the second domain of staple B by a factor 
of 2.6 by shortening the distance between its binding sites. 

Code availability. The code used to implement the folding model is freely avail- 
able via https://github.com/fdannenberg/dna. 
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Extended Data Figure 1 | The set of well-folded, planar states. a, Well- 
folded, planar states can be considered as two adjacent monomer tiles linked by 
a single reciprocal template crossing at any of the locations marked with a 
triangle and numbered (centre). This gives a set of 6 unique shapes, as indicated 
(periphery). b, With the exception noted below, there are four ways to make 
each of these shapes, distinguished by the nucleotide sequence at the template 
crossing but not resolved by AFM imaging. In the example shown, crossings 
made at positions 5 and 8 correspond to the fold 4:1, and crossings made at 
17 and 20 correspond to fold 1:4, as indicated in the circle diagrams (left). All 
give the same shape with fractional short edge offset w/W of 3/6 (right). 


The exception is that there are only two variants of state 5:0i as configurations 
formed by linking tiles at positions 1 and 24 are not distinguishable, nor are 
links at positions 12 and 13. ¢, Detailed view of the connection between 
monomer tiles in this case, for which the long-edge offset is not precisely 
defined (it can range from 0 to 2/7 depending on the conformation of the long 
edge staple). For the purpose of predicting geometry for model configurations, 
we take an average value of //L = 1/7. The set of 22 well-folded, planar 

states thus consists of two folds for the shape shown in ¢ and four folds for 
each of the other five shapes. 
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Extended Data Figure 2 | Well-folded, non-planar states and an illegal fold. _ allowing three reciprocal crossings between two tiles, those in b are formed by 
a, b, The set of legal folds permitted by the model consists of the 22 planar allowing 5 reciprocal crossings. These non-planar folds form only rarely in 
folds defined in Extended Data Fig. 1 and an additional 52 non-planar folds, —_ simulation. c, An example of a misfolded shape: the part-folded domains are, 
four for each of the 13 shapes shown here in a, b. Shapes in a are formed by _ individually, well-formed but cannot be joined to give a legal fold. 
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Extended Data Figure 3 | Fitting the shapes of origami tiles observed by 
AFM. a, AFM images were flattened by line-by-line subtraction of a second- 
order polynomial. Image processing and fitting were performed using custom 
MATLAB programs. Image 1.2 X 1.2 Lm. b, A histogram of pixel heights 
was used to set the threshold for the generation ofa binary image. The threshold 
was found by calculating the average of the means of the two peaks 
corresponding to background and tiles; if this failed because the image was 
noisy the threshold was set manually. c, Well-separated objects in the binary 
image which have the approximate area of a dimer tile were flagged for fitting 
(numbered). d, Tile outlines were generated using a Sobel edge-finding filter. 
e, Representative fitted outlines (two equal, offset parallelograms) were used to 
classify dimer tiles as described in the text (compare Fig. 3d). 
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Extended Data Figure 4| AFM data. Panels a-e show a 1.5 tm field of view _ includes crowded areas where shapes are touching and shapes where the two 
containing structures folded from each of the five staple sets of Fig. 4a-e.Shapes | component monomer tiles are distorted, perhaps during deposition on the 


that were flagged for fitting are marked with a dot, green if the shape was mica surface, but can be clearly assigned to one of the predicted shapes. Part- 
successfully fitted and red otherwise. The fitted outlines are superimposed on _ folded (or damaged) shapes are also observed, often with one well-folded 

the image. f, Examples of structures that were either not flagged for fitting monomer attached to a part-folded monomer; sometimes a portion of unfolded 
or not successfully fitted. AFM images are shown alongside the outline of a template can be observed. 


suggested structure. The collection of shapes that were not successfully fitted 
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Extended Data Figure 5 | Strong seam connections influence the folding 4:1 or 3:2 (as indicated by arrows to the right), but fold 5:0 is inaccessible unless 
pathway. The structure labelled 1:1 is a part-folded intermediate in which four _ two pairs of seam staples dissociate. Circle diagrams in the upper panel show 
pairs of seam staples are bound. If the seam staples remain in place this seam connections corresponding to the structures below. 

intermediate could progress to a fully folded structure with seam configuration 
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Extended Data Figure 6 | Monitoring origami assembly using fluorescence. 
Assembly of a monomer tile (Fig. 1) was monitored using fluorescently labelled 
staples. The positions of the labelled strands in the folded tile are shown in 

a: the seam staple was labelled with 5’ Cy3 and 3’ Black Hole Quencher 2, and 
the body staple with 5’ Cy5 and 3’ Black Hole Quencher 2. Reactions 
containing the monomer template at 50 nM and staples at 100 nM in a buffer 
containing 12.5 mM MgCl, 10 mM Tris-HCl and 0.5 mM EDTA pH 8.0 
were held at 96 °C for 10 min, cooled from 96 °C to 25 °C at 0.3 °C min !, held 
at 25°C for 10 min then heated to 96 °C at 0.3°C min’. The fluorescence 
signal for Cy3 and Cy5 was recorded at 0.3 °C intervals during cooling and 
heating cycles. Staple binding increases the separation between fluorophore and 


quencher and therefore increases the fluorescence intensity. b, Fluorescence 
intensities (F) and c, their derivatives (dF/dT) as functions of temperature 
during origami annealing and melting. Sharp transitions, corresponding to 
narrow ranges of staple incorporation temperatures, are consistent with 
cooperative origami assembly. In the case of the unmodified tile the seam staple 
is incorporated into the tile at the same temperature as the body staple. 
Hysteresis (marked *) is consistent with the cooperative binding of the seam 
staple. When one half of the seam is broken the hysteresis observed for seam 
staple binding is reduced and the seam staple is incorporated at a lower 
temperature than the body staple. Weakening the seam has little effect on the 
incorporation of the body staple. 
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Extended Data Figure 7 | Rearrangement of staples during folding. template domains is released and replaced by an alternative contact. Domains 
a-e, Heat maps showing the predictions of the model for the number of omitted from the map are those which would generate an illegal fold if 
reconfiguration events during assembly for each of the staple sets shown in reconfigured. 


Fig. 4a-e. A ‘reconfiguration event’ occurs when a contact between two 
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Extended Data Figure 8 | Evolving correlations between seam staples in the 
model during folding. a, The original staple set (Fig. 4a); b, the broken-seam 
variant (Fig. 4b). In each case, average data from 1,600 simulations are 
presented (‘all’) together with subsets sorted by final fold (5:0, 4:1, 3:2 and 
misfold). The simulation count for each subset is indicated below each panel. 
Simulations resulting in well-folded, non-planar structures (NP) are included in 
‘all? but not presented separately: such structures occurred 65 times in a and 

5 times in b. Circular icons with internal connections of different lengths 
represent links across the seam (‘seam links’) connecting points on the template 
spanning (that is, that are separated by) 28, 56, 84, 112 and 140 template 
domains (as in the ‘circle’ diagrams of Fig. 3). A ‘seam link’ represents a 
connection across the seam mediated by at least one seam staple (with the 
original staple set, a, it may also represent a pair of staples). Data are presented at 
seven different temperatures as the system is cooled. Correlations between seam 
links are represented graphically by three 5 x 5 blocks. Each pixel represents 
a correlation between a pair of seam links which are identified by two icons. The 
orientation of the icons has the same significance as in the ‘circle’ diagrams: two 
icons related by 180° rotation represent one internal link in each of the two 
halves of the template; a 90° rotation represents one internal link and one cross 
link. Only relative orientation is significant so, for example, fully folded state m:n 
is not distinguished from n:m (Extended Data Fig. 1b). Each pixel represents 
the average number of pairs of links present with the specified spans and relative 
orientations (range 0-8; colour coded, key at top right). The bar on the right of 
the figure, labelled ‘B’, represents the average occupancy of body staples 
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(range 0-2). For staple set a, folding is substantially complete at 62 °C: at this 
temperature the patterns of correlation that are characteristic of the fully- 
folded structures can be seen clearly. For example, the presence of the longest 
(140-domain) link with no cross-link to the other half of the template is 
characteristic of fold 5:0. (A 140-domain link with a cross-link only occurs in 
misfolds and NP structures.) A 112-domain link with a 28-domain cross-link 
is characteristic of 4:1, and the presence of two 56-domain links including a 
cross-link is characteristic of 3:2. These and other correlations that are 
characteristic of the final folds are already visible in the averaged correlation 
maps (when simulations are sorted by final fold) at very early stages of folding. 
The pattern of seam staples at an early stage of folding is therefore predictive 
of the final fold (Extended Data Fig. 9). For the broken-seam staple set b, intact 
seam staples are incorporated later in the folding pathway (the 50% incorpora- 
tion temperature for seam staples is 64.2 °C for a, 62.3 °C for b). The 50% 
body staple incorporation temperature is unchanged (63.9 °C for a, 64.0 °C for 
b). The same characteristic patterns of seam staples that, with the full seam, are 
associated with different final folds are also visible at high temperatures for 
the broken-seam staples. However, 90% of broken-seam simulations result in 
fold 5:0, as designed. Additional evidence for the influence of strong seam 
contacts on the folding pathway in the model is provided by the dramatically 
different yields of misfolds: 52% for full-seam staples a, 1% for broken-seam 
staples b. Stable incorporation of incompatible seam staples in a prevents the 
formation of well-folded structures. 
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Test 5:0 4:11 3:2 | N.P. | misfold | all 
1 }) and not = 117 8 5 5 82 |217 
2. )) and +) 16 | 100) 28 | 12 142 | 298 
3. |) ») and ~) 20 19 | 155 | 11 176 | 381 

Total: | 231 | 207 | 268 | 65 | 829 /|1600 


Extended Data Figure 9 | Seam-staple correlations at early stages of folding 
are predictive of the final fold. Data shown correspond to the original staple 
set (see Extended Data Fig. 8a and Fig. 4a). Three tests were applied at the 
temperature at which, on average, half of all seam staples are incorporated 
(64.2 °C). These tests were designed to discriminate between patterns of seam 
staples characteristic of different final folds. For simulations that satisfy each 
test, the table records the distribution between final folds. Test 1: a 140-domain 


seam link with no cross-link to the other half of the template (characteristic 
of fold 5:0). Test 2: a 112-domain link with a 28-domain cross-link 
(characteristic of fold 4:1). Test 3: two 56-domain links, including one internal 
link and one cross-link between halves of the template (characteristic of fold 
3:2). Highlighted entries correspond to the fold that each test was designed 
to predict. The last row of the table records the final distribution between folds 
of all 1,600 simulations. 
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Extended Data Figure 10 | Example calculations of staple hybridization 
rates. See Methods section ‘Example rate calculations’ for the worked 
examples. a, A half-bound seam staple (brown) can bind to one of two sites on 
the template (green). Distances along the template to each of the two possible 
binding sites for the second domain of the staple, measured in nucleotides 
and base pairs, are marked on the template. In the example shown, the closer 
binding site is connected by a 448-nt ssDNA chain and the further by a 
composite chain comprising a 2,208-nt single-stranded chain and one rigid 
16-bp double stranded segment. The local concentration of the closer domain 
at the half-bound staple is estimated to be 11 times higher than that of the more 
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distant domain with a correspondingly greater hybridization rate. b, The 
previous incorporation of staples changes the physical properties of the loops 
connecting staple binding sites and thus staple incorporation rates. In the 
absence of staple A, the shortest path between the binding domains of staple B 
shown consists of a 864-nt ssDNA chain. In the presence of staple A the 
path is shortened: it passes through the link formed by staple A and comprises 
384 nt ssDNA, 3 rigid 16-bp dsDNA segments and a staple crossover. The effect 
of the previous insertion of staple A, shortening the link between the two 
binding sites, is to accelerate the hybridization of the second domain of staple B 
by a factor of 2.6. 
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Alcohols as alkylating agents in heteroarene C-H 


functionalization 


Jian Jin! & David W. C. MacMillan! 


Redox processes and radical intermediates are found in many bio- 
chemical processes, including deoxyribonucleotide synthesis and 
oxidative DNA damage’. One of the core principles underlying 
DNA biosynthesis is the radical-mediated elimination of H,O to 
deoxygenate ribonucleotides, an example of ‘spin-centre shift’, 
during which an alcohol C-O bond is cleaved, resulting in a car- 
bon-centred radical intermediate. Although spin-centre shift is a 
well-understood biochemical process, it is underused by the syn- 
thetic organic chemistry community. We wondered whether it 
would be possible to take advantage of this naturally occurring 
process to accomplish mild, non-traditional alkylation reactions 
using alcohols as radical precursors. Because conventional radical- 
based alkylation methods require the use of stoichiometric oxi- 
dants, increased temperatures or peroxides*’, a mild protocol 
using simple and abundant alkylating agents would have consid- 
erable use in the synthesis of diversely functionalized pharmaco- 
phores. Here we describe the development of a dual catalytic 
alkylation of heteroarenes, using alcohols as mild alkylating 
reagents. This method represents the first, to our knowledge, 
broadly applicable use of unactivated alcohols as latent alkylating 
reagents, achieved via the successful merger of photoredox and 
hydrogen atom transfer catalysis. The value of this multi-catalytic 
protocol has been demonstrated through the late-stage functiona- 
lization of the medicinal agents, fasudil and milrinone. 

During DNA biosynthesis, ribonucleoside diphosphates are con- 
verted into their deoxyribonucleoside equivalents via the enzymatic 
activity of ribonucleotide reductase (class I-III)*. Crucially, a (3’,2')- 
spin-centre shift occurs, resulting in B-C-O scission and elimination 
of water (Fig. 1a). Considering the efficiency of this mild enzymatic 
process to cleave C-O bonds to generate transient radicals, we postulated 
whether an analogous chemical process could occur with simple alco- 
hols, such as methanol, to access radical intermediates for use in chal- 
lenging bond constructions (Fig. 1b). In the medicinal chemistry 
community, there is growing demand for the direct introduction of alkyl 
groups, especially methyl groups, to heteroarenes, given their influence 
on drug metabolism and pharmacokinetic profiles’. The open-shell addi- 
tion of alkyl radical intermediates to heteroarenes, known as the Minisci 
reaction”®, has become a mainstay transformation with broad application 
within modern drug discovery’. Unfortunately, many current methods 
are limited in their application to late-stage functionalization of complex 
molecules owing to their dependence on the use of strong stoichiometric 
oxidants or increased temperatures to generate the requisite alkyl 
radicals**. A photoredox-catalysed alkylation protocol using peroxides 
as the alkyl radical precursors was recently demonstrated’. Given the 
state of the art, we questioned whether a general alkylation protocol 
could be devised in which a broad range of substituents could be installed 
from simple commercial alcohols under mild conditions. 

Visible light-mediated photoredox catalysis has emerged in recent 
years as a powerful technique in organic synthesis that facilitates 
single-electron transfer events with organic substrates'*"*. This gen- 
eral strategy allows for the development of bond constructions that 
are often elusive or currently impossible via classical two-electron 


pathways. Recently, our laboratory introduced a new dual photoredox- 
organocatalytic platform to enable the functionalization of unacti- 
vated sp’ C-H bonds'®””. This catalytic manifold provides access to 
radical intermediates via C-H abstraction, resulting in the construc- 
tion of challenging C-C bonds via a radical-radical coupling mech- 
anism. With the insight gained from this dual catalytic system and 
our recent work on the development of a photoredox-catalysed 
Minisci reaction’*, we questioned whether it would be possible to 
generate alkyl radicals from alcohols and use them as alkylating agents 
in a heteroaromatic C-H functionalization reaction (Fig. 1c). While 
there are a few early reports of alcohols as alkyl radical precursors 
formed via high-energy irradiation (ultraviolet light and gamma 
rays)'*?1, a general and robust strategy for using alcohols as latent 
alkylating agents has been elusive. This transformation would repres- 
ent a direct C-H alkylation of heteroaromatics with alcohols via a 
spin-centre shift pathway, eliminating H,O as the only by-product. 
We recognized that this mild alkylating procedure would serve as 
a powerful and general method in late-stage functionalization, 
using commercially available and abundant alcohols as latent alkylat- 
ing agents. 

A detailed description of our proposed dual catalytic mechanism 
for the alkylation of heteroarenes with alcohols is outlined in Fig. 2. 
Irradiation of Ir(ppy),(dtbbpy)~ (1) (in which ppy = 2-phenylpyridine, 
dtbbpy = 4,4'-di-tert-butyl-2,2'-bipyridine) will generate the long- 
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Figure 1 | Bio-inspired alkylation process using alcohols as spin-centre shift 
equivalents via a dual catalytic platform. a, DNA biosynthesis occurs via a 
spin-centre shift (SCS) process, catalysed by ribonucleotide reductase (RNR) 
class I to generate a carbon-centred radical, after elimination of HO as a by- 
product. b, Alcohols (for example, methanol) as radical intermediates when 
spin-centre shift allowed. c, Proposed direct installation of alkyl groups using 
alcohols under mild photoredox organocatalytic conditions. 
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Figure 2 | Proposed mechanism for the direct alkylation of heteroaromatic 
C-H bonds via photoredox organocatalysis. The catalytic cycle is initiated via 
excitation of photocatalyst 1 to give the excited state 2. A sacrificial amount 
of heteroarene 3 oxidizes *Ir! 2 to Ir'Y 4, which then oxidizes thiol catalyst 5 to 
generate thiyl radical 6 and regenerate catalyst 1. Thiyl radical 6 then abstracts a 
hydrogen atom from alcohol 7 to form -oxy radical 8. Radical 8 adds to 
heteroarene 3, producing radical cation 9, which after deprotonation forms 
a-amino radical 10. Spin-centre shift elimination of HO forms radical 
intermediate 11. Protonation and reduction by *Ir'™ 2 delivers alkylated 
product 12. HAT, hydrogen atom transfer; MeOH, methanol; SET, single- 
electron transfer. 


lived *Ir(ppy)2(dtbbpy)* (2) excited state (t =557ns)*?. As 
*Ir(ppy) 2(dtbbpy) * (2) can function as either a reductant or an oxid- 
ant, we postulated that 2 would undergo a single-electron transfer 
event with a sacrificial quantity of protonated heteroarene 3 to initiate 
the first catalytic cycle and provide the oxidizing Ir(ppy) 2(dtbbpy)** 

(4). Given the established oxidation potential of Ir(ppy)2(dtbbpy)*~ 
(4) (E, = +1.21 V versus saturated calomel electrode in CH;CN)”, 
we anticipated that single-electron transfer from the thiol catalyst 5 
(E,p°%4 = +0.85 V versus saturated calomel electrode for cysteine)” 
to Ir(ppy)2(dtbbpy)** (4) would occur and, after deprotonation, fur- 
nish the thiyl radical 6 while returning Ir(ppy)2(dtbbpy) * (1) to the 
catalytic cycle. At this stage, we presumed that the thiyl radical 6 would 
undergo hydrogen atom transfer with the alcohol 7 (a comparable thiol, 
methyl 2-mercaptoacetate S-H bond dissociation energy = 87 kcal 
mol ! (ref. 24), methanol o-C-H bond dissociation energy = 96 kcal 
mol ' (ref. 25)) to provide the «-oxy radical 8 and regenerate the thiol 
catalyst 5, driven by the polar effect in the transition state’. The polar 
effect is a remarkable property that enables considerably endergonic 
C-H abstractions that would not be possible otherwise”’. The nucleo- 
philic o-oxy radical 8 would then add to the protonated electron- 
deficient heteroarene 3 in a Minisci-type pathway to afford the aminyl 
radical cation 9. The resulting «-C-H bond of 9 is sufficiently acidic to 
undergo deprotonation to form the «-amino radical 10 (ref. 28). At this 
juncture, intermediate 10 is primed to undergo a spin-centre shift to 
eliminate H,O and generate benzylic radical 11. The resulting open- 
shell species would then undergo protonation followed by a second 
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single-electron transfer event with the excited photocatalyst 2 to regen- 
erate the active oxidant Ir(ppy)2(dtbbpy)”* (4), while providing the 
desired alkylation product 12. 

We first examined this new alkylation protocol using isoquinoline 
and methanol as the coupling partners, and evaluated a range of 
photocatalysts and thiol catalysts. Using Ir(ppy).(dtbbpy)PF, (1) 
and ethyl 2-mercaptopropionate (5), along with p-toluenesulfonic acid 
and blue light-emitting diodes as the light source, we were able to 
achieve the desired C-C coupling to provide 1-methylisoquinoline 
(15) with a 92% yield (see Supplementary Information). Notably, we 
observed none of the desired product in the absence of photocatalyst, 
thiol catalyst, acid or light, demonstrating the requirement of all com- 
ponents in this dual catalytic protocol. In addition, this method 
requires only weak visible light and ambient temperature to install 
methyl substituents using methanol as the alkylating agent. 

With the optimal conditions in hand, we sought to evaluate the 
generality of this dual catalytic alkylation transformation. As high- 
lighted in Fig. 3a, a wide range of heteroaromatics are methylated 
under the reaction conditions. Isoquinolines with electron-donating 
or -withdrawing substituents (such as methyl substituents, esters and 
halides) are functionalized in excellent efficiencies (15-18, 85-98% 
yield). Quinolines perform effectively, including those that contain 
non-participating functionality (19-23, 65-95% yield), in addition 
to phthalazine and phenanthridine coupling partners (24 and 25, 
70% and 93% yield). Moreover, a wide range of pyridine derivatives 
containing diverse functionality (such as esters, amides, arenes, nitriles 
and trifluoromethyl groups) can be converted into the desired methy- 
lation products in high yield (26-32, 65-91% yield). 

Next, we sought to investigate the nature of the alcohol coupling 
partner, as demonstrated in Fig. 3b. A broad array of primary alcohols 
can effectively serve as alkylating agents in this new alkylation reac- 
tion. In contrast to the methylation conditions highlighted above, 
alcohols in Fig. 3b typically use methyl thioglycolate 13 as the C-H 
abstraction catalyst. Notably, simple aliphatic alcohols such as ethanol 
and propanol deliver the alkylated isoquinoline product in high yields 
(33 and 34, 95% and 96% yield). Steric bulk proximal to the alcohol 
functionality is tolerated, as exemplified by the presence of isopropyl, 
B-tetrahydropyran, B-aryl and B-adamantyl substituents (35-38, 87- 
92% yield). The presence of an electron-withdrawing trifluoromethyl 
(CF;) group distal to the alcohol decreases the rate of the reaction; 
however, using the more electrophilic thiol catalyst, 2,2,2-trifluor- 
oethanethiol (14), can promote the transformation more efficiently, 
possibly owing to the polar effect on the hydrogen atom transfer 
transition state (39, 93% yield)’*. We found that diols also participate 
readily in this alkylation protocol (40 and 41, 88% and 81% yield). It 
should be noted that 1,3-butanediol demonstrates exceptional che- 
moselectivity and undergoes alkylation exclusively at the primary 
alcohol site. We speculate that the corresponding «-oxy radical at 
the secondary alcohol position does not attack the protonated hetero- 
arene owing to its increased steric hindrance. For these alkylating 
agents with several reactive sites (41, 43 and 44), thiol catalyst 5 is 
the most effective hydrogen atom transfer catalyst—mechanistic 
studies are continuing to determine the origin of these differences 
in catalyst reactivity. Ethers, in the form of differentially substituted 
tetrahydrofurans, are also competent alkylating agents in this dual 
catalytic platform (42-44, 72-90% yield). In the elimination step, the 
tetrahydrofuran ring opens to reveal a pendent hydroxyl group. 
Interestingly, 3-hydroxytetrahydrofuran and tetrahydrofurfuryl alco- 
hol react regioselectively at the ether «-oxy site distal to the alcohol to 
afford alkylation products with terminal pinacol motifs. We attribute 
this exclusive regioselectivity to a subtle influence on C-H bond 
dissociation energy owing to the inductive influence of the oxygen 
atoms. The application of these substrates represents an effective 
method to install vicinal diol motifs that would be inaccessible using 
traditional oxidative alkylation methods. Finally, the utility of this 
mild alkylation protocol has been demonstrated by the late-stage 
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Figure 3 | Substrate scope for the alkylation of heteroaromatic C-H bonds 
with alcohols via the dual photoredox organocatalytic platform. A broad 
range of heteroaromatics and alcohols are efficiently coupled to produce 
alkylated heterocycles under the standard reaction conditions (top, 
generalized reaction). a, A variety of isoquinolines, quinolines, phthalazines, 
phenanthridines and pyridines are efficiently methylated using methanol as the 
alkylating reagent. b, A diverse selection of alcohols serve as effective alkylating 


functionalization of several pharmaceutical compounds. Using meth- 
anol as a simple methylating agent, fasudil, a potent Rho-associated 
protein kinase inhibitor and vasodilator, can be methylated in 82% 
yield (product 45). Additionally, milrinone, a phosphodiesterase 3 
inhibitor and vasodilator, can be alkylated with 3-phenylpropanol 
in 43% yield (product 46). 

Mechanistic studies have been conducted to support the proposed 
pathway outlined in Fig. 2. Stern-Volmer fluorescence quenching 
experiments have demonstrated that the *Ir'" excited state 2 is 


41, 81% yield 


agents in this dual catalytic protocol. c, Ethers are also amenable to the 
transformation; the products are the corresponding ring-opened alcohols. 

d, Two pharmaceuticals, fasudil and milrinone, can be alkylated using this 
protocol, demonstrating its utility in late-stage functionalization. Isolated yields 
are indicated below each entry. See Supplementary Information for 
experimental details. 


quenched in the presence of protonated heteroarene 3, but not in the 
presence of the unprotonated heteroarene or thiol catalyst 5, indicating 
an oxidative quenching pathway (see Supplementary Information). 
Furthermore, a series of experiments were conducted to investigate 
the proposed spin-centre shift elimination. After exposing hydroxy- 
lated intermediate 47 to the reaction conditions, only a modest amount 
of the methylated isoquinoline 15 is observed (8% yield, entry 1, 
Fig. 4a). In the absence of an acid additive, only trace yields of the 
desired product are formed (2% yield, entry 2, Fig. 4a). However, in 
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Figure 4 | Mechanistic studies support spin-centre shift elimination 
pathway. a, Hydroxymethyl intermediate 47 can be converted to methylated 
15 under net reductive conditions after addition of formic acid-tributylamine 
and p-toluenesulfonic acid (TsOH). b, Deoxygenation of 47 probably proceeds 
via a spin-centre shift pathway to cleave the alcohol C-O bond. ¢, In the 
presence of styrene, 47 is converted to 50, presumably by trapping of radical 
49. DMSO, dimethylsulfoxide; LEDs, light-emitting diodes. 


the presence of a stoichiometric reductant and p-toluenesulfonic acid, 
the elimination of oxygen can be achieved in good efficiency (60% 
yield, entry 3, Fig. 4a). Crucially, this elimination pathway is shut down 
in the absence of either light or photocatalyst (entry 4 or 5, respectively, 
Fig. 4a). Therefore, this net reductive process supports the proposed 
generation of &-amino radical 48, which could readily form deoxyge- 
nated product 15 via a spin-centre shift pathway to B-amino radical 
49 (Fig. 4b). This elimination pathway is further corroborated by a 
series of radical trapping experiments (Fig. 4c and Supplementary 
Information). In the presence of styrene, hydroxymethyl arene 47 is 
transformed to adduct 50 (65% yield, Fig. 4c), presumably via the 
intermediacy of B-amino radical 49. Finally, while we support the 
mechanism outlined in Fig. 2, we cannot rule out the possibility of a 
radical chain pathway in which radical 11 abstracts an H-atom from 
alcohol 7 or thiol catalyst 5. 

In summary, this alkylation strategy represents the first, to our 
knowledge, general use of alcohols as simple alkylating agents and 
enables rapid late-stage derivatization of medicinally relevant mole- 
cules. Given the influence on drug pharmacokinetics and absorption, 
distribution, metabolism and excretion (ADME) properties, this 
method of installing inert alkyl groups will probably find wide applica- 
tion in the medicinal chemistry community. We have developed a mild 
and operationally simple alkylation reaction via the synergistic merger 
of photoredox and thiol hydrogen atom transfer organocatalysis to 
forge challenging heteroaryl C-C bonds using alcohols as latent 
nucleophiles. This bio-inspired strategy mimics the key step in 
enzyme-catalysed DNA biosynthesis via a new spin-centre shift elim- 
ination of H2O to generate radical intermediates from simple alcohols. 
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Current land surface models assume that groundwater, streamflow 
and plant transpiration are all sourced and mediated by the same 
well mixed water reservoir—the soil. However, recent work in 
Oregon’ and Mexico’ has shown evidence of ecohydrological sepa- 
ration, whereby different subsurface compartmentalized pools of 
water supply either plant transpiration fluxes or the combined 
fluxes of groundwater and streamflow. These findings have not 
yet been widely tested. Here we use hydrogen and oxygen isotopic 
data (7H/'H (87H) and '80/'°O (8'%0)) from 47 globally distrib- 
uted sites to show that ecohydrological separation is widespread 
across different biomes. Precipitation, stream water and ground- 
water from each site plot approximately along the 57H/3'°O slope 
of local precipitation inputs. But soil and plant xylem waters 
extracted from the 47 sites all plot below the local stream water 
and groundwater on the meteoric water line, suggesting that plants 
use soil water that does not itself contribute to groundwater 
recharge or streamflow. Our results further show that, at 80% of 
the sites, the precipitation that supplies groundwater recharge and 
streamflow is different from the water that supplies parts of soil 
water recharge and plant transpiration. The ubiquity of subsurface 
water compartmentalization found here, and the segregation of 
storm types relative to hydrological and ecological fluxes, may be 
used to improve numerical simulations of runoff generation, 
stream water transit time and evaporation-transpiration parti- 
tioning. Future land surface model parameterizations should be 
closely examined for how vegetation, groundwater recharge and 
streamflow are assumed to be coupled. 

Freshwater fluxes via plant transpiration (45,000 km? yr, ref. 3, 
to 62,000 km? yr 1 ref. 4), streamflow (37,000 km? yr | to 40,000 km? 
yr ', refs 5, 6) and groundwater recharge (12,000km*yr’ to 
16,200 km? yr, ref. 7) are central components of the terrestrial 
hydrosphere. Understanding the sources of water and processes that 
govern each component is important for predicting the effects of global 
change on water security and ecosystem services®. One of the most 
useful tools for quantifying water-cycle components and the linkages 
between plant ecology and physical hydrology is stable-isotope tra- 
cing®. Global isotopic databases developed over the past 60 years’ have 
enabled continental-scale assessments of transpiration/evaporation 
ratios* and the recycling of rainfall back into the atmosphere”. 

While global sets of precipitation’, streamflow’ and groundwater’? 
data are now available for analysis, measurements of plant xylem 


Table 1 | Key information on 47 globally distributed isotopic data sets 


waters (that is, water moving within plants) remain dispersed through- 
out the primary, specialist literature. Synthesizing global groundwater, 
streamflow and plant xylem water isotopic data is important because 
recent watershed-based case studies have shown evidence of eco- 
hydrological separation'*—meaning that the soil water that supplies 
plant transpiration is isolated from the water that recharges ground- 
water and replenishes streamflow. These two recent field studies both 
showed that plant transpiration is supplied by waters within unsat- 
urated soils, but that local streamflow and groundwater were supplied 
by mobile water (linked to infiltrating precipitation) that moves 
through the soil seemingly unmixed with the waters that are retained 
in the soil. 

Compartmentalization of a poorly mobile plant transpiration water 
pool versus a highly mobile stream/groundwater pool, if widespread, 
would challenge existing land surface model parameterizations that 
assume that plants and streams draw from a single, well mixed subsur- 
face water reservoir’. If true, such widespread ecohydrological sepa- 
ration would also have implications for isotope-based assessments of 
evaporation/transpiration ratios that rely on well mixed systems’. 
Here, we use a new global isotope database to test the ecohydrological 
compartmentalization hypothesis: that the isotopic composition of 
waters that supply plant transpiration differs from that of waters 
that supply groundwater and streamflow. The global ecohydrological 
isotope database consists of '*O/'°O and *H/'H ratios for plant xylem 
water (n = 1,460), soil water (n = 1,830), stream water (n = 336), 
groundwater (n = 2,749) and precipitation (n = 488) at 47 globally 
distributed locations (Table 1, Fig. 1). 

Our approach is predicated on the knowledge that precipitation 8°H 
and '%O values (see Methods for definitions) co-vary along a regres- 
sion line with a 5°H/5'*O slope of eight (this is the global meteoric 
water line, GMWL)”*. The physical process of evaporation occurs 
under disequilibrium, produces a strong kinetic isotope effect that 
yields 5°H/5'*O slopes of less than eight", and results in a situation 
in which water samples that have undergone some evaporation plot 
‘below’ the regression line of precipitation isotopic data. We use this 
well known difference between the meteoric water line and the local 
evaporation line as a key marker for ecohydrological compartmenta- 
lization’. 

Figure la-d shows isotopic data for groundwater, stream water, 
plant xylem water and soil water from our compiled database. 
Globally, headwater streams and groundwater plot approximately 


Biome Number of papers RH (%) MAT (°C) MAP (mm yr~?) LMWL slope Plant 57H (%o) Soil 57H (%0) Stream 7H (%o) GW 87H (%o) 

Arid 7 49+85 13 +5.2 314 (89) 8.0 (0.3) —66 (39) —44 (51) —73 (15) —27 (50) 
editerranean 6 58+7.3 15+40 331 (157) 7.1 (2.5) —48 (19) —43 (27) —46 (24) —31(17) 
Temperate forests 17 58+85 89+5.0 533 (692) 8.2 (0.8) —79 (36) —79 (23) —91 (48) —84 (41) 
Temperate grasslands 7 56+5.1 16+38 478 (662) 7.1 (0.5) —28 (18) —28 (10) —22 (14) —30 (41) 
Tropics 10 65+11 23 +38 1350 (1340) 8.2 (0.3) —34 (33) —38 (64) —7.4 (30) —14 (10) 


RH, relative humidity; MAT, mean annual temperature; MAP, mean annual precipitation; LMWL, local meteoric water line; GW, groundwater. Values are mean + 1 s.d. or median (interquartile range). 
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Figure 1 | 5'°O and 87H values of groundwater, 
stream water, plant xylem water and soil water 
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at 47 globally distributed sites. The median 

4 (interquartile range) 5'80 and 8H values are: 

a, groundwater: —7.7 (7.4), —51.5 (62.6), 

] n = 2,749; b, stream water: —6.2 (8.8), —37.1 

4 (66.9), n = 336; ¢, plant xylem water: —5.5 (6.1), 
—50.6 (50.6), n = 1,460; d, soil water: —7.5 (7.4), 
the —63.9 (52.2), n = 1,830. The inset in a shows 

4 the locations of 47 globally distributed stable 
isotopic data sets. The histogram borders show 
partitioning of the data sets at 30 identical intervals 
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along the GMWL. These patterns suggest that stream water and 
groundwater follow the local precipitation input signal’*. Plant xylem 
and soil waters extracted from the 47 studies plot below the regression 
of global meteoric waters—a result of the strong kinetic isotope effect 
via the process of evaporation”. 

To quantify the similarities or differences between waters used by 
plants and waters that contribute to groundwater and streamflow, we 
use a site-by-site comparison based on a precipitation offset’®: 


Precipitation offset = [S°H—a 8'*O—b] /S (1) 


where a and b are the slope and y intercept, respectively, calculated 
from monthly measurements of 5'°O and 8°H from local precipitation 
at each study site, and S is one standard deviation measurement uncer- 
tainty for both 8'°O and S°H. The precipitation offset describes the 
difference in the isotopic composition of environmental waters from 
that of local precipitation, which has, by definition, a precipitation 
offset of zero. The precipitation offset can distinguish hydrological 
processes that occur under chemical equilibrium (for example, the 


condensation of vapour’*) from hydrological processes that occur 
under disequilibrium (for example, evaporation’). Plant transpiration 
does not affect the precipitation offset, whereas the evaporation of 
meteoric water near the land surface results in precipitation offset 
values of less than zero. By comparing the local precipitation offsets 
of our four water types (that is, soil water, plant xylem water, stream 
water and groundwater), we can use the stable isotopes to distinguish 
evaporated waters from non-evaporated waters and to test whether 
streamflow, groundwater and plant transpiration are supplied by one 
well mixed subsurface water reservoir, or more than one water res- 
ervoir (namely water that is retained in the soil and water that 
recharges groundwater and discharges in streams). 

Figure 2 shows that plant xylem water offsets (median, interquartile 
range, P<0.0001 using nonparametric Steel-Dwass method) 
(—5.6, 4.7) and soil water offsets (—6.2, 4.4) are significantly different 
from the offsets of groundwater (—1.8, 3.2) and stream water (0.22, 
3.7) in all five of the biomes represented by the 47 sites in our database. 
Of our 47 sites, 40 have groundwater precipitation offsets that are 
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statistically distinct (P< 0.05 using two-tailed homoscedastic/het- 
eroscedastic tests, as applicable) from both soil water and plant xylem 
water precipitation offsets. Our analysis is suggestive of a widespread 
occurrence of ecohydrological separation—that is, poor and incom- 
plete mixing of subsurface water, with one reservoir of water sustain- 
ing plant transpiration, and another contributing to groundwater 
recharge and streamflow. On a site-by-site basis, groundwater and 
stream water have a precipitation offset that is on average respect- 
ively 5.4 and 4.8 higher (that is, closer to zero) than do soil and plant 
xylem waters. The greatest differences between the precipitation off- 
sets of streamwater/groundwater and plant xylem/soil water are 
found in the tropical and Mediterranean biomes (7.7 and 5.4, respect- 
ively), with smaller differences observed in the arid, temperate grass- 
land, and temperate forest biomes (3.6, 2.4 and 1.6 on average, 
respectively). 

Recent work has shown that different storm types contribute dis- 
proportionately to groundwater recharge (see, for example, refs 11, 
18). Some studies have shown that more intense storms dominate 
groundwater recharge'®; others present evidence to the contrary”. 
While our analyses do not allow us to associate storm intensity with 
either plant transpiration or groundwater recharge fluxes, we can 
nevertheless trace the isotopic composition of the precipitation from 
which plant xylem water originated. We calculated the intersection 
points of local plant xylem evaporation lines with local meteoric water 
lines (LMWLs)—that is, plant xylem 6 source value (see Extended 
Data Fig. 1 and Methods): 


3°H intercept = 8° H—m8%O (2) 


50 intercept = [87H intercept — b] /a (3) 


where m, a and Db are the slope of the evaporation line, the LMWL 
slope, and the LMWL intercept, respectively. 

The results of this analysis show that at 80% of the sites (see 
Extended Data Table 1; 37 of 46 sites) where plant xylem water 6 
source values can be calculated, groundwater isotope values (median, 
interquartile range, P< 0.05 using nonparametric Wilcoxon method) 
(—52, 63%o 57H) are statistically different from plant xylem water 5 
source values (— 82, 83%o 5°H). This suggests that, in many cases (see, 
for example, Extended Data Fig. 2), ecologically and hydrologically 
important precipitation is segregated in both space and time, even 
before these waters become further segregated in the subsurface for 
plant transpiration or for groundwater recharge and streamflow (see 
Methods and Extended Data Figs 2 and 4). 

We also use equations (2) and (3) to trace the isotopic composition 
of precipitation from which soil water originated—that is, the soil 
water 6 source value. We find that at 83% of the sites (Extended 
Data Table 1; 29 of 35 sites) where soil water isotopic data are available, 
soil water 5 source values (— 104, 96%o 57H) are statistically different 
from groundwater isotope values. The significant difference between 
soil water 5 source and groundwater isotope values suggests that some 
forms of precipitation that recharge the subsurface may be more 
important than others to plant transpiration fluxes. We assess the 
uncertainties in parameter m (equation (2)) and find overall average 
uncertainties of 1.07%o for 5'°O and 5.54%o for 5°H (2c). These are 
slightly less than, but somewhat comparable to, the prediction uncer- 
tainties in precipitation isotope values (1.17%o for 5'°O and 9.4%o for 
8°H; ref. 20). 

Plants regulate water fluxes from the subsurface to the atmosphere’. 
Our discovery that ecohydrological separation is widespread through- 
out the terrestrial water cycle has major implications for isotope- 
based estimates of runoff sources)”, streamwater residence times?! 
and evaporation/transpiration partitioning’. Recent estimates* of 
catchment-scale transpiration/evapotranspiration (T/ET) ratios have 
followed an assumption of well mixed water stores within the critical 
zone, consistent with most land surface parameterizations’’; our 
findings fundamentally challenge this assumption as it relates to 
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catchment-based evapotranspiration partitioning*””’ and most land 


surface models’”. Our work would suggest that downstream water 
isotope compositions are biased towards precipitation and ground- 
water source contributions, and do not reflect the composition of water 
seen in soil. This in turn casts doubt on the estimates of transpiration/ 
evapotranspiration made in other studies if based solely on isotope 
data, meaning that evapotranspiration partitioning based on down- 
stream water isotope compositions may not represent an integrated 
catchment-wide isotopic signature as widely applied. 

Notwithstanding these issues, our general finding that transpira- 
tion comprises the greatest fraction of terrestrial evapotranspiration 
is reinforced by the lines of evidence discussed in ref. 4, and by the 
results of land surface models (terrestrial T/ET of 59% to 80%; refs 24, 
25), atmospheric vapour isotope measurements (European T/ET of 
62%; ref. 26), global syntheses of stand-level transpiration measure- 
ments (terrestrial T/ET of roughly 61%; ref. 3), and some but not all 
general circulation models (see refs 27, 28). Although transpiration 
is, indeed, the largest component of terrestrial evapotranspiration*, 
our results show that the mechanisms by which such partitioning 
takes place, and links to other components of the water cycle”, are 
still poorly understood. These combined findings point the way 
towards the research that is needed to understand the ecophysiolo- 
gical basis of ecohydrological separation across biomes. Finally, our 
results also suggest that existing land surface model parameteriza- 
tions of plant physiological processes and runoff* (that is, stream- 
flow) can be made more realistic through the incorporation of 
ecohydrological separation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data compilation and treatment. We performed a keyword-based search of 
published literature for stable water isotopes in ecology and hydrology. Because 
ecohydrological separation” is based on the offset of a water sample from the local 
meteoric water line (that is, ‘precipitation offset’'*; equation (1)), we included only 
dual-isotope findings and excluded papers that used either 87H or 5'8O alone. 
Stable isotope values from the 47 papers found were then extracted in one of two 
ways: first, where data were reported in tabular form, we compiled the data directly 
into the database; second, where plant xylem and soil water isotope data were not 
reported in tabular form, we used a graphical user interface to extract data points 
from figures in the original paper. We then calculated the precipitation offset 
values on the basis of equation (1). The measurement uncertainty S in equation 
(1) was calculated as: 


s ; 5 fei ; 30.5 
S= iG H analytical error)” + (5'°H analytical error) (4) 


Reported analytical errors for 8°H and 8°O are 1%o and 0.2%o on average, 
respectively. 

We extracted groundwater isotope data for 45 of 47 sites either from the com- 
piled papers (n = 24) or from the comprehensive global groundwater database 
(n = 21) of ref. 11. Of the 21 groundwater data sets compiled using the latter 
database, 16, 2, 1 and 2 data sets are within a 200-, 300-, 400- and 500-km radius 
of actual study sites. The radii within which groundwater data were extracted were 
chosen so that we could build groundwater data sets for most of the 47 sites in our 
database. To test whether or not the choice of radii imposed a scale-dependent 
variation (that is, bias) in isotopic trends, we performed a sensitivity analysis by 
calculating the precipitation offset values of groundwater at distances of 25, 50 and 
100km. We found that precipitation offset values of groundwater did not 
differ statistically in space. That is, precipitation offset of groundwater at 25 km 
(—3.5 + 2.2, n= 688) was not statistically different from precipitation offset 
at 50km (—2.5+2.4, n=1,605), 100km (—2.4+2.4, n=3,295), 200km 
(-2.7 2.2, n= 6,598), 300km (—2.5 + 4.5, n = 12,000), 400km (—2.8 + 4.6, 
n= 18,239) and 500 km (—2.8 + 4.8, n = 24,000). This scale-invariant behaviour 
of groundwater precipitation offset supported our choice of radii in building the 
data sets for 45 of 47 sites in our database. It also reinforced one of the key messages 
of this work, in that groundwater isotopes generally fall along the local meteoric 
water line. 

To show that plant transpiration water and groundwater recharge are related to 
different storm types, we traced the precipitation 5 source value of plant xylem 
water by calculating the intersection points of local evaporation lines with local 
meteoric water lines (LMWLs) (equations (2) and (3); Extended Data Fig. 1). Ona 
site-by-site basis, we compared the calculated precipitation 5 source value of plant 
xylem water and soil water with the mean groundwater 6 value (Extended Data 
Table 1). 

Comparing plant xylem water 6 source values with mean groundwater 6 values 
requires intuitively that both should be situated as close to each other as possible at 
a site. The distance of groundwater wells to actual study sites in our database, 
however, varies from 0 km to almost 500 km. To test whether our approach of 
comparing both isotope composition values was statistically robust, we ran a 
sensitivity analysis by comparing plant xylem water 5 source values with only 
the closest groundwater well to a given site. Increasing the radii between actual 
study sites and sites of groundwater measurements was then used as a critical 
evaluation metric for the approach (Extended Data Fig. 3). Our results showed 
that, for five increasing radii ranges between the actual xylem water study site and 
groundwater well site, the differences (median (interquartile ranges), absolute 
3°H%o) between plant xylem water 5 and groundwater 6 values (24 (29), n = 7; 
30 (30), n = 8; 31 (42), n = 7; 21 (22), n = 9; 23 (40), m = 11) are not statistically 
different from each other (P > 0.90, Tukey-Kramer honest significant difference). 
This suggests that our approach in comparing plant xylem water 5 source values 
(that is, xylem evaporation line intercept with LMWL) and mean groundwater 
value at a site is valid. We underline that this does not imply that groundwater 
isotope values are invariant in space, but rather that the mean difference between 
plant xylem water 5 source values and mean groundwater values is invariant in 
space (statistically not different), as shown in Extended Data Fig. 3. 

We make a distinction between the two phenomena: ‘segregation’ of storm 
types and ‘ecohydrological separation’. The former is related to source precipita- 
tion analysis (equations (2) and (3)), the latter to the fate of these waters either as 
groundwater or for plant transpiration (equation (1)). Segregation of storm types 
and ecohydrological separation in space is ubiquitous in the global data set. We are 
unable to test for both phenomena in time because of limitations in the available 
information in the compiled source papers. That is, if a source paper has data for at 
least two time points (usually contrasting moisture time points) then we can use 
such information to explore temporal contrasts (38 of 47 sites). For the 38 sites 
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that satisfy this criterion, both storm-type segregation and ecohydrological sepa- 
ration exist in 30 and 32 of 38 sites, respectively (P< 0.05 using nonparametric 
Wilcoxon Method). 

We recognize that non-weighted plant xylem water isotope values would be 
biased towards values where transpiration rates are low. To test the robustness of 
the precipitation offset parameter, we also calculate the transpiration-amount- 
weighted isotopic composition of plant xylem water (8,1 weightea)) using compiled 
long-term, global, biome-level transpiration rate estimates’: 


ee Sxy1(i) T; 
a1 Ti 
where 6x, represents the isotopic composition of xylem water during sampling 
month i, and T; represents the amount of transpiration during month i. As illu- 
strated in Fig. 2, both transpiration-amount-weighted and non-weighted plant 
xylem precipitation offsets are statistically different from zero, supporting our 
primary conclusion that plant transpiration water chemistries are different from 
groundwater and streamflow at 40 of 47 locations. We use no amount weighting 
on groundwater isotope values, in agreement with observations that showed 
little change in groundwater isotopic composition on timescales of years and 

decades****. 

To trace the fate of water after precipitation (that is, either as groundwater 
recharge or as plant water uptake), we quantified the precipitation offset from 
the LMWL (equation (1)). We confirmed ecohydrological separation at a study 
site if plant xylem water and soil water isotopic composition fall below the regres- 
sion of 8°H and 870 values in local precipitation on the LMWL. 

Conventional notation for isotope composition is used where 5 = (Reampte! 
Retandara — 1) X 1,000%o, where R is the ratio of '80/'°O (8'80) or *H/'H (87H) 
in the sample, or in the international standard (Vienna-standard mean ocean 
water, V-SMOW). 

Statistical analysis. Parametric requirements of normality and equal variances, 
particularly for aggregate precipitation offset values, are not satisfied via attempts 
to transform the data. Testing whether group means are located similarly 
across groups is performed using nonparametric tests, which use functions of 
the response ranks (or rank scores). A Kruskal—Wallis/Steel-Dwass method is 
performed to test whether or not the precipitation offset values of the water 
types—groundwater, stream water, plant xylem water and soil water—differ stat- 
istically from each other. We perform a similar nonparametric test (Dunn all pairs 
for joint ranks method) by computing ranks on all the data. The results are the 
same as those from the pairwise method Kruskal-Wallis/Steel-Dwass test. To test 
whether each water type is statistically different from zero (that is, the precipitation 
offset value of local precipitation), the Dunn method for joint ranking is per- 
formed. The test shows that plant xylem water and soil water are statistically 
different from zero, while groundwater and stream water are not statistically 
different from zero. This test result supports the interpretation that groundwater 
and stream water fall along the SH/8°O slopes of local meteoric water lines, while 
plant xylem water and soil water fall ‘below’ the slopes of this linear regression. The 
same method is also used to test for statistical significance of precipitation 
offset values of each water type across biomes. These nonparametric tests are 
based on ranks and control for the overall alpha level (« = 0.05). The Dunn 
method, which reports P values after a Bonferroni adjustment, is used to correct 
for multiple testing problem that may arise from an inflated type I error rate 
(0.0001 = P=0.05). Where parametric requirements are met, particularly for 
intrasite tests on water types, Student’s t/Tukey-Kramer HSD tests are performed 
as applicable. Uncertainty estimation, particularly for equations (2) and (3) para- 
meters, is performed with the jack-knifing approach”. 

A mechanism for ecohydrological separation. Partial mixing of ‘new (incom- 
ing) and ‘old’ (resident) water in the subsurface is rarely considered in conceptual 
models**?*. Our key finding that groundwater/stream water and soil/plant uptake 
water are fundamentally (physically and temporally) separated supports the 
dynamic partial mixing model of ref. 37. In fact, it was the contrasting conclusions 
drawn by ref. 1 compared with those of refs 38 and 39 regarding the mixing 
mechanisms that led the authors of ref. 37 to propose the use of the following 
dimensionless mixing coefficient Cyy;, controlled mainly by soil moisture content: 


Sy 
1 1 Te CM 
Cui= = — xerf| — 6 
ce ae” ( oCuiV2 (6) 
where Sy and Sy. 


me, are actual storage and storage capacity within the root zone, 
respectively; Cy; and ¢Cy,; are location and shape parameters, respectively; and 
iis the storage compartment. Equation (6) is applied to tracer (for example, stable 
water isotopes) balance equations, which may then enable functional comparisons 
amongst other alternative diagnostic models (for example, the more widely used 
complete mixing model). 


(5) 


Sxyl(weighted) — 
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Our precipitation offset parameter analysis (equation (1)) is used to modify 
equation (6) by substituting the precipitation offset value of soil water for the 


term 


Sy, 


‘max 


a se ee) (7) 
‘ 2 2 oCuiv2 
where |P,| is the absolute value of precipitation offset parameter. This results in a 
dimensionless mixing coefficient Cy, value that decreases as precipitation offset 
|P,.| value increases. When Cy; is applied in tracer mass balance equations (as 
outlined in ref. 37), mixing between ‘new’ and ‘old’ water increases as soil moisture 
decreases; or, conversely, separation between ‘new’, ‘fast-flowing’ waters and ‘old’, 
‘matrix’ waters increases with higher antecedent soil moisture. The persistence of 
‘old’ water within the soil matrix and reduced participation in dispersive and 
diffusive exchange with preferential flow path water lead to continued exposure 
to evaporation (stage 1, capillary action, and stage 2, vapour diffusion). For details 
regarding evaporation from porous media, see ref. 40. 

Our conceptual formulation as outlined in equation (7) is supported by the 
results of our precipitation offset analysis. Our analysis provides a site-by-site 
(Extended Data Table 2) and biome-level (Extended Data Table 3) quantification 
of the magnitude of separation—and, by extension, mixing—between ground- 
water recharge and stream discharge, and the water that recharges the soil matrix 
and is being taken up by plants for transpiration. Extended Data Table 3 shows 
that in soils of the arid biome, the precipitation offset value is highest (that is, closer 
to zero); conversely, in soils of the humid tropics where antecedent soil wetness is 
high, the precipitation offset value is lower. Calculating the dimensionless mixing 
coefficient Cy; using the precipitation offset values in Extended Data Table 3 and 
plugging these values into equation (7) supports the observation that in the dry 
soils of the arid biome, mixing between new, fast-flowing waters and old, matrix 
waters increases. The opposite is true for the other extreme, in humid tropical soils, 
where antecedent soil wetness is high. In general, because plants in our compiled 
database use soil water, these precipitation offset trends in soils are therefore 
consistent with plant xylem water data. That is, the magnitude of ecohydrological 
separation—plants using evaporated soil water that is isotopically distinct from 
groundwater recharge and stream discharge—increases with antecedent soil wet- 
ness. The relationship between soil wetness and the dimensionless mixing coef- 
ficient Cy, is discussed in detail and tested with actual, long-term catchment-level 
data in ref. 37. However, we state a caveat: the use of the precipitation offset 
parameter in equation (7) may be considered as a coarse (first-order) approxi- 
mation given the nonlinear relationship between evaporative loss and the precip- 
itation offset parameter. 

While ref. 1 was the first paper to develop the ecohydrological separation 
concept and was relatively successful at proposing a mechanistic explanation for 
the observed results, other work has shown that such a mechanism may not 
universally explain the observed ecohydrological separation. For example, ref. 2 
also found ecohydrological separation in a seasonally dry cloud forest in Mexico; 
these authors argued that the mechanism proposed in ref. 1 was not likely to 
explain the observed isotopic separation in their study. The plant xylem water 
values in ref. 1 are more enriched than most of the soil water values—the opposite 
case to ref. 2. If the ‘first in, last out? mechanism proposed by ref. 1 was correct, then 
the measured plant xylem values should have matched those of (or at least be 
bounded by) the measured soil water values. Their data suggest that this was not 
the case. In contrast, the authors of ref. 2 observed their plant xylem water values to 
lie completely in between precipitation and bulk soil water values. The aggregate 
result (Extended Data Fig. 4) from our global data set lends support more to the 
interpretation of ref. 2 than to that of ref. 1. 

Water extraction techniques. As underlined in our central message, plant xylem 
water and soil water isotopes plot ‘off the LMWLs, supporting the idea of a 
widespread occurrence of ecohydrological separation on a global scale. This find- 
ing is true across the different techniques used to extract water out of soil and plant 
stem samples in our data set. The authors of ref. 1 argued that plant transpiration is 
supplied by ‘tightly bound’ waters within unsaturated soils. This interpretation 
was inferred from the laboratory technique used to extract water out of a soil 
sample (cryogenic vacuum distillation), which uses suction pressures that are 
orders of magnitude greater than those used in other field techniques (for example, 
suction lysimetry). Potential nuances in the fidelity of water extraction from soil 
samples using existing laboratory techniques have recently been explored*!”’. 
These findings suggest that soil physicochemical characteristics may contribute 
to isotopic fractionation, specifically with respect to 5'*O. We explored the rela- 
tionship between water extraction techniques and plant xylem water/soil water 
5'°0 in our data set. Extended Data Fig. 5 shows the plant xylem water/soil water 
5'80 values using a liquid—vapour equilibration technique from cryogenic vacuum 


distillation and azeotropic distillation. Although there are statistically significant 
differences (P < 0.0001, nonparametric Dunn method for joint ranking) between 
both cryogenic vacuum (n = 2,640) and azeotropic distillation (n = 441), and 
liquid-vapour equilibration methods (n = 204), there is no significant difference 
in plant xylem water 5'°O between the two more widely used techniques, cryo- 
genic vacuum and azeotropic distillation (P = 0.35, nonparametric Dunn method 
for joint ranking). Despite these differences in 8'%O of plant xylem water and soil 
water with respect to water extraction techniques, both water types plot ‘off the 
LMWL in dual-isotope space. This suggests that ecohydrological separation exists 
beyond any differences in soil water 5'%O that are related to different water 
extraction techniques. 

Global map of plant xylem water 5°H and 810. For the first time, to our 
knowledge, we provide not only a global map of plant xylem 8°H and 8'%0, but 
also their relationship to respective LMWLs as integrated in the precipitation 
offset parameter—a fundamental descriptor of ecohydrological separation 
(Extended Data Fig. 6). Our compilation of global plant xylem 8°H and 8'%0 
may complement other existing large-scale isotopic data sets from precipitation 
and streams”, in pursuing future research questions related to plant-water rela- 
tions from continental to global scales. 
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Extended Data Figure 1 | Schematic representation of tracing the isotopic 
composition of source precipitation. Plant xylem water isotopic values plot 
on a linear regression called the evaporation line. The point on the local 
meteoric water line (LMWL) where the plant xylem water evaporation line 
intersects provide a good approximation of the mean isotopic value of plant 
xylem source precipitation. The same method is used in tracing the soil water 
5 source value. 
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amount-weighted average precipitation (yellow star), GMWL (solid black line) 
and LMWL (dashed black line). This is an example of a case in Oregon, USA 
(ref. 1) where mean groundwater isotope value is more positive than plant 
xylem source precipitation value. This is the case in 41 of 47 sites in our 


Extended Data Figure 2 | Tracing the isotopic composition of plant xylem 
source precipitation versus mean groundwater value. Plant xylem water 
(grey triangles, n = 88) plotted in 5'*O-8°H space. Shown are the mean plant 
xylem source precipitation value (green triangle with error bars, +1 s.d., 

n = 88), mean groundwater value (blue circle with error bars, +1s.d.,n = 271), database. 
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Distance between plant 5-source precip site and nearest groundwater location (Km) 
Extended Data Figure 3 | The difference between plant xylem 5-source show the extents of outliers. Also shown are median (interquartile range) values 
precipitation values and mean groundwater 87H values, plotted against (P > 0.90, Tukey-Kramer honest significant difference) for five (n = 7; n = 8; 


increasing distance of groundwater locations from actual plant xylem study = = 7; n = 9; n = 11) arbitrary distance ranges. 
sites. The extents of the boxes show the 25th and 75th percentiles; whiskers 
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Extended Data Figure 4 | Groundwater and plant xylem source mean + 1 s.d.) and precipitation that leads to plant water uptake (green triangle 
precipitation. Plot of 5'°O versus 8°H for global plant xylem water (green with error bars, mean + 1 s.d.). The inset shows the linear regression of 
triangles, n = 1,460), soil water (grey circles, n = 1,830), and groundwater plant xylem water and soil water, forming distinct evaporation lines (ELs) 
(blue circles, n = 2,749). Also shown are the isotopic composition of source whereby, at a site level, plant xylem water is completely bounded by soil water. 
precipitation that leads to groundwater recharge (blue circle with error bars, Also shown are GMWL and LMWL in the main plot and inset, respectively. 
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Extended Data Figure 5 | Comparison of plant xylem (black boxes) and soil _azeotropic distillation are not significantly different from each other (P = 0.35, 
water (grey boxes) 5'8O, based on water extraction techniques. Cryogenic — nonparametric Dunn method for joint ranking). The extents of the boxes 


vacuum (n = 2,640) and azeotropic distillation (n = 441) are significantly show the 25th and 75th percentiles; whiskers show the extents of outliers. Also 
different from liquid—vapour equilibration methods (n = 204) (P < 0.0001, shown are median (interquartile range) values for each water type and water 
nonparametric Dunn method for joint ranking). Cryogenic vacuum and extraction technique. 
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Extended Data Figure 6 | Global map of plant xylem water precipitation offsets from 47 study sites. 
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Extended Data Table 1 | Site-by-site source precipitation 5 values for plant xylem water, groundwater and soil water 


Site ID Eocation Benge Groundwater (67H) pate ee Gare ue Ga 
1 Texas A, US*® -38 (27) -25 (1.8) -33 (27) N.S. ” 
2 Arizona A, US*” -131 (16) -96 (22) -91 (11) “ N.S. 
3 Arizona B, US# -96 (43) -56 (13) -45 (53) + N.S. 
4 Arizona-Utah, US* -125 (30) -81 (0.5) -160 (24) * * 
5 Northwest CN®° -91 (54) -81 (17) -30 (49) N.S. N.S. 
6 Junggar Plain, CNS" -109 (9.9) -49 (13) -87 (40) * + 
7 Ningxia Plain, CNS -139 (26) -76 (15) -149 (21) * 
8 California A, US -86 (31) -44 (5.5) -57 (43) “ 
9 Cape Town, ZA% -58 (29) -33 (20) : a E 
10 Evora, PTS -46 (2.7) -29 (2.3) -51 (10) + * 
11 Eyre Peninsula, AU®® -31 (10) -24 (1) -15 (6.4) = sid 
12 Mt. Natl Park, ZA®7 -18 (13) -18 (16) Z N.S. : 
13 Ordos Plateau, CNS -113 (33) -63 (13) -94 (34) * * 
14 Hubei Province, CN®° -114 (34) -66 (6.8) -125 (50) a = 
15 N. Carolina, US® -52 (13) -40 (4.1) -56 (12) * + 
16 Beijing, CN® -90 (6.4) -64 (23) -102 (8.1) a + 
17 Guizhou Prov, CN® -152 (51) -49 (7.2) -145 (54) * + 
18 New Hampshire, US? -103 (29) -52 (9.2) - sa - 
19 Shanxi Prov, CN®* -98 (8.8) -62 (4.8) -108 (29) + * 
20 Horgin, CN® -147 (21) -75 (8) -141 (41) + * 
21 Queensland, AU® -18 (23) -30 (3.3) - N.S. - 
22 Sichuan A, CN®7 -120 (9.6) -86 (12) -131 (33) * + 
23 Colorado A, US® -214 (34) -107 (17) -190 (38) ae * 
24 Colorado B, US® -156 (141) -92 (13) 2 *s : 
25 Sierre, CH? -151 (39) -106 (5.8) -142 (56) + * 
26 Oregon, US! -130 (10) -100 (53) -132 (36) + + 
27 Sichuan B, CN”! -127 (10) -86 (12) -124 (11) * + 
28 Pre-Alpine, IT’2 - -56 (2.1) - - - 
29 Utah, US” -128 (69) -114 (23) N.S. = 
30 California B, US -128 (21) -92 (8.4) -126 (49) * * 
31 Victoria, AU’ -29 (32) -35 (0.3) -43 (22) N.S. N.S. 
32 Texas B, US” -64 (12) -29 (2.6) z + : 
33 Nebraska, US”” -134 (42) -73 (11) - N.S. - 
34 River Murray A, AU -14 (40) -33 (7) -29 (49) N.S. N.S. 
35 Texas C, US”? -25 (18) -17 (1.6) . “ : 
36 Texas D, US® -66 (37) -26 (3.8) -40 (29) ws * 
37 River Murray B, AU®" -42 (7.7) -30 (3.3) -36 (14) * * 
38 West Central, MX® -216 (43) -71 (23) -242 (105) ae a 
39 Niger®? -28 (4.9) -41 (16) -16 (25) = 
40 Limpopo, ZA* -64 (19) -19 (15) - a - 
41 Cameroon® -45 (30) -15 (2.7) - * - 
42 Guangxi A, CN® -146 (26) -49 (14) -104 (119) * N.S. 
43 Luquillo-Susua, PR®” -27 (19) -8.8 (7.5) -14 (26) a + 
44 Veracruz, MX? -67 (23) -71 (23) -126 (53) N.S. * 
45 Guangxi B, CN® -105 (45) -49 (7.2) -142 (113) * se 
46 Rio de Janeiro, BR®® -32 (44) -15 (2.7) 10.4 (23) bi 
47 Sardinilla, PA®° -54 (22) -15 (2.7) -58 (29) ae a 


Plant xylem and soil water 6 source precipitation values (median, interquartile range) are calculated using equations (2) and (3). The last two columns show whether or not the source precipitation values are 
statistically different amongst the three water compartments. N.S., not significant. Superscript numbers after site locations refer to the source paper (refs 46-90). 
**Denotes statistically significant difference (« = 0.05). 
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Extended Data Table 2 | Site-by-site soil water precipitation offset 


values 
Site ID Precip offetsoil 

1 -6.2 (1.1) 
2 -0.2 (1.8) 
3 -4.1 (1.6) 
4 -7.4 (1.9) 
5 -0.7 (4) 

6 -5.6 (5.4) 
7 -1.5 (0.6) 
8 -7.5 (1.6) 
10 -5.5 (3) 

1 -3.6 (0.2) 
13 -0.8 (1.2) 
14 -4.5 (1) 

15 -3 (0.9) 

16 -1.4 (0.7) 
17 -6 (1.1) 

19 -4.5 (1.3) 
20 1.78 (1.2) 
22 -9.7 (1.2) 
23 -6.5 (0.6) 
25 -1.1 (1.1) 
26 -5.6 (1) 

27 -4.7 (0.6) 
28 -6.4 (0.3) 
30 -4 (2.4) 

31 -5.4 (1) 

34 -5.5 (1) 

36 -4.1 (1.1) 
37 -5 (1.3) 

38 -3.9 (0.8) 
39 -2.4 (2.5) 
42 -12 (3.3) 
43 -8.2 (0.9) 
44 -10 (1.2) 
45 -8.4 (1.4) 
46 -9.9 (2.5) 
47 -3.3 (0.8) 


Values are median (interquartile range). 
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Extended Data Table 3 | Biome-level soil water precipitation offset 


values 
Biome Precip offetsoii 
Arid -3.7 (4.6) 
Mediterranean -7 (3.6) 
Temperate forests -5.2 (2.4) 
Temperate grasslands -5.1 (1.1) 
Tropics -9.3 (2.2) 


Values are median (interquartile range). 
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Broad plumes rooted at the base of the Earth’s mantle 


beneath major hotspots 


Scott W. French'+ & Barbara Romanowicz>?* 


Plumes of hot upwelling rock rooted in the deep mantle have been 
proposed as a possible origin of hotspot volcanoes, but this idea is 
the subject of vigorous debate’. On the basis of geodynamic com- 
putations, plumes of purely thermal origin should comprise thin 
tails, only several hundred kilometres wide’, and be difficult to 
detect using standard seismic tomography techniques. Here we 
describe the use of a whole-mantle seismic imaging technique— 
combining accurate wavefield computations with information con- 
tained in whole seismic waveforms*—that reveals the presence of 
broad (not thin), quasi-vertical conduits beneath many prominent 
hotspots. These conduits extend from the core-mantle boundary to 
about 1,000 kilometres below Earth’s surface, where some are 
deflected horizontally, as though entrained into more vigorous 
upper-mantle circulation. At the base of the mantle, these conduits 
are rooted in patches of greatly reduced shear velocity that, in the 
case of Hawaii, Iceland and Samoa, correspond to the locations of 
known large ultralow-velocity zones*-’. This correspondence clearly 
establishes a continuous connection between such zones and mantle 
plumes. We also show that the imaged conduits are robustly 
broader than classical thermal plume tails, suggesting that they 
are long-lived*, and may have a thermochemical origin’”’. Their 
vertical orientation suggests very sluggish background circulation 
below depths of 1,000 kilometres. Our results should provide con- 
straints on studies of viscosity layering of Earth’s mantle and guide 
further research into thermochemical convection. 

More than 40 years ago, Morgan’ proposed that hotspot volcanoes 
are the surface expression of narrow plumes of hot material that ori- 
ginate in a boundary layer in the deep mantle, as one would expect of a 
convecting fluid that is heated from below’. Whether deep mantle 
plumes exist, and how deep their roots are, has been the subject of 
lively debate, which continues to this day. The idea that hotspots may 
be anchored at the core-mantle boundary (CMB) is supported by 
several observations: the relative fixity of hotspots with respect to 
global mantle circulation’; the correlation of hotspot locations with 
the large low shear velocity provinces (LLSVPs) at the base of the 
mantle”; and a suggestion from geodynamic modelling”’ that hotspots 
might preferentially occur above ultralow velocity zones (ULVZs). A 
radically different origin for hotspots has also been proposed, in which 
these features are the consequence of melting owing to shallow con- 
vective processes, with their morphologies controlled by stresses and 
cracks within the lithosphere’. 

In the classical view’, a mantle plume is composed of a large head 
and a thin tail, which connects it to a root deeper in the mantle. If such 
plumes were to originate at the base of the mantle, we would expect the 
lower mantle to contain narrow (less than 200 km in diameter, accord- 
ing to relevant scaling relations’), continuous, vertically oriented col- 
umns of hotter-than-average (and therefore of low seismic velocity) 
material, located in the vicinity of presently active hotspots. 

In the deep mantle, short-wavelength, low-velocity anomalies are 
difficult to image with standard seismic tomographic techniques, which 
typically rely on travel times of body waves (seismic waves that travel 


through the interior of Earth); such anomalies can be hidden from view 
by wavefront healing effects'®. Also, most hotspot volcanoes, and poten- 
tially any associated plumes, are located in the middle of oceans, where 
they are difficult to image owing to the lack both of dense seismic net- 
works and of earthquakes with the appropriate geometry. 

Thus, while various tomographic studies have hinted at the pres- 
ence of plume-like features in the lower mantle associated with some 
subset of the major hotspots'®*, ambiguity remains as to the vertical 
continuity of these features, how distinct they are from other low- 
velocity ‘blobs’ in the lower mantle, and whether they represent detec- 
tion of the narrow type of plumes typically associated with purely 
thermal convection’’. To improve the resolution of low-velocity fea- 
tures of limited lateral extent such as plumes, two ingredients are 
needed: first, better illumination of Earth’s interior; and second, 
improved theoretical description of the interaction of the seismic 
wavefield with the three-dimensional Earth structure. 

Here we present robust evidence for large, vertically continuous, 
low-velocity columns in the lower mantle beneath many prominent 
hotspots, from our recent global, radially anisotropic, whole-mantle 
shear-wave velocity model, SEMUCB-WM1 (ref. 4). This model was 
constructed by inversion of a large data set of full, long-period seismo- 
grams, including first- and second-orbit fundamental mode and over- 
tone surface waves down to 60s, as well as body waveforms down to 
32 s. Because it includes surface-wave overtones, shear waves diffracted 
along the core-mantle boundary (Sgig) and multiply-reflected waves 
between the surface and the CMB, this data set provides considerably 
better illumination of the whole-mantle volume than can be obtained 
with a standard set of travel times alone. In addition, accurate numer- 
ical computation of the forward wavefield using the spectral element 
method” at each iteration of the model construction allows us to better 
resolve regions of lower-than-average shear-wave velocity, as prev- 
iously illustrated for the upper mantle*'. The construction of this 
model is briefly summarized in the Methods. 

In model SEMUCB-WM1 (ref. 4), broad, dome-like plumes that 
show a reduction of shear-wave velocity by more than 1.5%-2% are 
present in the lower mantle beneath Samoa, Hawaii and the Pacific 
Superswell volcanoes. These plumes are clearly distinct from other, 
more isolated and weaker low-velocity features that appear in cross- 
sections spanning half of Earth’s circumference (Fig. 1). The plumes 
are rooted in patches of more strongly reduced shear-wave velocity 
near the CMB, and extend vertically up to depths of at least 1,000 km, 
above which their character changes. A three-dimensional view of the 
central Pacific region (Fig. 2) shows that the cores of these plumes are 
well separated from each other across most of the lower mantle, 
embedded in the lower-than-average-velocity background of the 
Pacific LLSVP. This is particularly clear in the depth range 1,000 km 
to 1,500 km, where there is a one-to-one relationship with the corres- 
ponding hotspot volcanoes, although the plumes are not always 
located exactly beneath the volcanoes. Comparison with previous glo- 
bal models (Extended Data Fig. 1) indicates general agreement on the 
background long-wavelength features, while in SEMUCB-WM1 the 
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Figure 1 | Whole-mantle depth cross-sections of relative shear-velocity 
variations in model SEMUCB-WM1‘, in the vicinity of major hotspots. The 
sections are shown in the inset maps, with the direction of the projection 
indicated by the position of the purple dot in both map and cross-section views 
(black boxes correspond to the three-dimensional rendering regions in Fig. 2). 
Green dots and triangles mark the locations of hotspots”’. The reference 
model is the corresponding global one-dimensional average shear-wave 
velocity (V,) profile of SEMUCB-WM1. The colour scale has been chosen to 
emphasize lower-mantle structures, resulting in substantial saturation in 

the upper mantle. Broken lines indicate depths of 410 km, 660 km and 

1,000 km. Focused, quasi-vertical, broad plumes extend continuously from 


plumes stand out as continuous features confined to well defined 
vertically oriented columns. 

In particular, the Hawaiian plume appears as a separate vertical 
conduit of varying width (Fig. 2a-c), with a weaker zone at around 
500 km above the CMB, rooted in its own patch of strongly reduced 
shear velocity at the base of the mantle. In the transition zone, this 
plume appears to be strongly deflected towards the west-southwest 
(Fig. 3). This morphology is compatible with evidence for a hot upper 
mantle to the west of Hawaii, based on the analysis of converted waves 
(that is, receiver functions)**. The presence of bodies with higher-than- 
average velocity southwest and northeast of the Hawaiian chain is in 
agreement with regional studies”***. However, in the lower mantle, the 
associated conduit is more vertically oriented in SEMUCB-WM1. 
Similar broad, vertically oriented low-velocity conduits are found in 
the vicinity of some hotspots lying on the border of the African LLSVP 
(Figs le, f and 3d, e and Extended Data Fig. 2). 

The lower-mantle plume conduits described above are rooted in 
wide patches (of diameter 500-800 km) of strongly negative velocity 
reduction near the CMB. In at least three cases, these patches coincide 
in location with large ULVZs previously detected in the vicinity of the 
corresponding hotspots: near Hawaii (detected through observations 
of post-cursors to diffracted S waves’), and beneath Iceland® and 
Samoa’ (found through the study of waveform distortion in the phase 
SPaiffKS). 
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patches of strongly reduced V, at the base of the mantle to depths of at least 
1,000 km in the vicinity of: a, Samoa; b, Tahiti, the Marquesas, the Galapagos 
and Samoa; c, Pitcairn; d, MacDonald; e, Cape Verde; and f, the Canary Islands. 
These plumes stand out from other low-velocity features in these cross- 
sections, which span nearly half of Earth. d, Note the absence of a noticeable 
anomaly in the lower mantle immediately beneath the Yellowstone hotspot. 
However, a faint low-velocity conduit appears to the southwest (offshore of 
North America), anchored by a low-velocity patch in the D” mantle region. It is 
beyond the resolution of our study to verify whether this feature is related to the 
Yellowstone or the Guadalupe (c) hotspot. 


Because of the computational challenges of whole-mantle imaging 
using full waveforms and numerical simulations, the resolution of our 
model is limited by our choice of parametrization and maximum 
frequency. However, resolution tests (Methods; Extended Data Figs 
4-8) clearly indicate that our approach can resolve the vertical con- 
tinuity of plumes without ray-like smearing or erroneous deflection, 
and that the variations of the shape and amplitude of the plumes with 
depth are likely to be robust features. These tests also indicate (see 
Supplementary Information section $1 and Supplementary Figs 1 
and 2) that our modelling approach can distinguish between hypo- 
thetical broad superplume-like features and the distinct vertical con- 
duits that are shown in Fig. 1. Numerical experiments (Supplementary 
Information section S2 and Supplementary Figs 5-9) demonstrate that 
plumes of the same scale as are seen in Fig. 1 and used in our tests 
should be readily detectable in the waveform data used by our inver- 
sion. Furthermore, on the basis of relative amplitude recovery alone, 
our resolution tests also show that in order to obtain a velocity reduc- 
tion of 2% or more over the major part of the lower mantle—as seen in 
our model—a narrow plume would have to be very strong (that is, 
>10% reduction in shear-wave velocity for a plume of width <200 km; 
see Methods and Extended Data Fig. 4). Such a strong velocity contrast 
would translate into unrealistically high” effective temperature 
excesses of 1,500-2,000 °C. In contrast, for a 2% velocity anomaly over 
a width of 800-1,000 km, as imaged in SEMUCB-WM1 under Hawaii 


©2015 Macmillan Publishers Limited. All rights reserved 


BV IV, (%) 
-2.0 i +.2.0 


Figure 2 | Three-dimensional rendering of shear-wave-velocity structure in 
the Pacific Superswell region. Relative velocity perturbations are shown with 
respect to the global average at each depth. For each panel, the location of the 
box is shown in the inset map of Fig. 1. The region is shown from above, with 
cuts at increasing depths: a, 410 km; b, 660 km; ¢, 1,000 km; d, 1,500 km; 

e, 2,000 km; f, 2,500 km. The following hotspot locations, projected down from 
the surface, are indicated by green cones in each box: Hawaii (top; north), 
Samoa (left; west), and the four Superswell hotspots: Tahiti, Pitcairn, 
Marquesas and MacDonald. Well defined vertically oriented conduits with 
central cores of velocity lower than — 1.5% can be associated with each of the 
hotspots, particularly clearly in c and d. The low-velocity conduit beneath 
Hawaii stands out in b-d. In f, patches of much lower-than-average velocity 
start appearing within the Pacific LLSVP, continuing down to the CMB. On the 
other hand, at a depth of 410 km, the low-velocity conduits start spreading 
horizontally and merge into the depth range in which low-velocity fingers have 
previously been observed in the upper mantle”’. 


(Fig. 3), effective temperature excesses of 400-500 °C (estimated in a 
similar manner) are plausible, if one considers that the plumes are 
rooted in a chemically dense layer’. Whether or not these plumes 
entrain much chemical heterogeneity to the upper mantle is beyond 
the resolution of this study; however, their width and vertically varying 
shapes are compatible with models of thermochemical plumes'*”’. 
Clearly resolved plumes with similar characteristics are found 
beneath 11 major hotspots (Fig. 4 and Extended Data Table 1). 
Visual inspection of the SEMUCB-WM1 model suggests the presence 
of weaker conduits extending from the CMB through most of the 
lower mantle beneath several other hotspots. Taken together, this 
ensemble of hotspots includes all of those classified as ‘primary’ in 
ref. 26. Interestingly, all of the plumes we detected are located within 
or at the borders of the African and Pacific LLSVPs. In contrast, we 
found no deep-mantle expression of those hotspots”*” that are located 
above faster-than-average shear-wave velocities in the mantle region 
next to the CMB (the D” region). This may indicate either that the 
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corresponding plumes are below the detection capabilities of our 
current modelling approach in the lower mantle, or that they originate 
in the upper mantle or the uppermost lower mantle. This is the case, in 
particular, for the Yellowstone hotspot (Fig. 1d). In the case of the 
Bowie/Juan de Fuca hotspots, the low-velocity anomaly can be fol- 
lowed to the base of the upper mantle”, but not below. Likewise, most 
of the North African hotspots (except Afar and the neighbouring 
East African Rift system, particularly beneath Tanzania) have no 
clear lower-mantle expression. Instead, they appear to derive from a 
very broad lower-mantle dome, much further south, associated with 
the African LLSVP (Fig. 3d and Extended Data Fig. 3). The difference 
in morphology of the Pacific (bundle of plumes) and African 
(broad dome) LLSVPs, which has been suggested previously”, is 
now very clear. 

Our results confirm the presence of broad plume-like conduits in the 
lower mantle, located in the vicinity of major hotspots*'’. For the first 
time, to our knowledge, our study establishes their quasi-vertical con- 
tinuity from about 1,000-km depth down to the CMB, where they are 
rooted in patches or domes of much lower-than-average shear velocity, 
at least some of which (in Iceland, Hawaii and Samoa) coincide with the 
location of known ULVZs. These plumes exhibit other common char- 
acteristics. They are remarkably vertical, indicating that they are not 
strongly affected by a lower-mantle wind”, and may represent the 
primary upwellings in the lower mantle. They often have a pinched 
zone between 500 km and 1,000km above the CMB—a shape that is 
similar to what may be expected for thermochemical plumes!*”. 

Interestingly, the character of the anomaly appears to change at a 
depth of around 1,000 km: some plumes are shifted horizontally (for 
example, Pitcairn and St Helena); others become thinner (such as 
Samoa and Tahiti); and yet others cannot clearly be tracked to shal- 
lower depths (for example, Cape Verde), indicating that they may have 
split into narrower conduits*'* that, in some cases, are below the 
resolution of our model. In Iceland and Hawaii (Fig. 3), horizontally 
elongated arms branch out from the plume stem just below 1,000 km, 
suggesting that the flow may have encountered resistance to direct 
continuation into the upper mantle. Similarly, we observe what 
appears to be ponding of low-velocity material beneath 1,000 km 
(for example at the Canary Islands; Fig. 1f). This is also the depth range 
in which some slabs appear to stagnate*’. Those plumes that we can 
track to shallower depths appear to meander through the upper mantle 
and, in many cases, connect to the previously observed low-velocity 
channels in the depth range 200-400 km (ref. 21; see, for example, 
Pitcairn in Figs 1 and 2). 

The observed characteristics of these seismically imaged plumes 
should provide improved constraints on our understanding of the 
viscosity structure and dynamics of the mantle. The direct connection 
of these plumes with ULVZs on the one hand, and with prominent 
hotspots on the other, suggests that the plumes are long-lived and 
contain material originating in the D” region that is of lower viscosity 
than the bulk lower mantle*. The change of character at a depth of 
1,000 km—taken together with the stagnation of some subducted 
slabs*°—suggests the presence of a notable change in viscosity around 
that depth. The variation in amplitude and width of these velocity 
anomalies in the lower mantle may be related to the stage of develop- 
ment of the plumes'’™”’, or to variations of the viscosity structure with 
depth*'”’, or to a combination of viscosity structure and density con- 
trast owing to chemical heterogeneity at the base of the mantle’’. The 
dominantly vertical character of the plumes in the lower mantle indi- 
cates that circulation may be very sluggish away from the plumes. The 
often-invoked mantle wind” would be largely confined to the upper 
mantle, where plumes may indeed become entrained with a substantial 
non-vertical component (for example, at Pitcairn and Hawaii) into the 
more vigorous secondary-scale convection that is probably driven by 
plate motions”. 

Our study demonstrates the potential of combining waveform 
tomography with accurate modelling of wave propagation in 


3 SEPTEMBER 2015 | VOL 525 | NATURE | 97 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


+ 500 km 
+ | 
r + 1,000 km 
r 7 
k 41,500 km 
i > 
F 3 2,000 km 
k 


Approximate 
lines of section 


Figure 3 | Hawaii, Iceland, St Helena and the African superplume. a, Three- 
dimensional rendering of the Hawaiian plume, viewed from the southeast. In 
the lower mantle, the plume conduit is vertically oriented, rooted in a patch of 
very low V, at the base of the mantle, with a weak zone in the depth range 
2,300-2,600 km. Above approximately 1,000 km, the conduit is deflected 
towards the west into the transition zone, and appears to interact with the low- 
velocity finger oriented in the Pacific plate absolute plate motion direction” 
visible in cross-section above a depth of 500 km. b-e, Two-dimensional vertical 
cross-sections along planes as indicated in the inset maps and corresponding 
broken green lines in a, represented in the same way as in Fig. 1. In c, in the 


SEMUCB-WM1 at 2,800-km depth 


O Somewhat resolved 
@ Not associated with any hotspot 


@ ‘Primary’ plumes 
@ Clearly resolved 


Figure 4 | Locations of plumes detected in the lower mantle in model 
SEMUCB-WM1*. The background map represents the relative V, variations at 
2,800 km in this model, with respect to the global average at that depth. We 
identify three categories of plumes. ‘Primary’ plumes are those for which 6V,/V, 
is lower than -1.5% for most of the depth interval 1,000-2,800 km. These 11 
plumes also correspond to regions of the lower mantle where the average 
velocity reduction over the depth range 1,000-1,800 km is significant at the 2¢ 
level (see, for example, Supplementary Figs 3 and 4). Clearly resolved 

plumes correspond to vertically continuous conduits with 5V,/V, greater than 
—0.5% in the depth range 1,000-2,800 km. Somewhat resolved plumes have 
vertically trending conduits with 5V,/V, greater than —0.5% for most of the 
depth range 1,000-2,800 km, albeit not as clearly continuous. Plumes are 
numbered as listed in Extended Data Table 1. Green dots represent the 
global hotspot distribution according to ref. 27. Note that none of the plumes 
detected falls within a region of faster-than-average velocity at the base of 
the mantle, and that long-wavelength structure in this model agrees with that of 
previous tomographic models (see, for example, Supplementary Fig. 10). 


98 | NATURE | VOL 525 | 3 SEPTEMBER 2015 


Iceland 


-2.0 i a 2.0 
BV IV, (%) 


lowermost mantle between the two westernmost reference white dots, we see 
the edge of the plume associated with the Caroline hotspot. Subduction zones 
are well imaged in the western Pacific (b, c), spreading above the 1,000-km 
horizon, and in south America (d); in c and e, the fossil Farallon subduction 
extends through the lower mantle. Blue zones in the vicinity of Hawaii in 

the lower mantle may potentially be downwellings corresponding to other fossil 
slabs, although this warrants further study. In e, we note the particularly broad 
African plume, one lobe of which extends to the mantle transition zone, the 
other giving rise in the mid lower mantle to a thinner, weaker plume beneath 
St Helena. 


order to advance our understanding of the upwelling part of deep 
mantle flow and its relationship to surface observations. While 
thin ‘classical’ plume tails may exist in the mantle, they remain 
below the resolution of global tomography at present. However, 
our robust confirmation of the existence of broad, possibly ther- 
mochemical, plumes—associated with prominent hotspots and 
rooted in the D” region (at least some of them in ULVZs)— 
should provide important constraints for further geodynamical 
modelling of present-day mantle circulation, and for Earth’s 
heat-flux budget’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Waveform inversion technique. As was the case for models SEMum* and 
SEMum2? (ref. 21), model SEMUCB-WM1/* was developed using a time-domain 
waveform inversion technique that combines highly accurate spectral-element 
forward wavefield modelling” with efficient sensitivity kernel computation using 
nonlinear asymptotic coupling theory (NACT)”**. This approach allows us to take 
advantage of both the accuracy of computing the misfit function with spectral- 
element modelling, and the efficiency of NACT-based kernels, which furthermore 
allow us to use a quickly converging Gauss-Newton scheme. Because the approx- 
imate Hessian matrix cannot easily fit in computer memory when considering 
whole-mantle scale inversions such as SEMUCB-WM1, we designed a high-per- 
formance distributed-memory abstraction” for its assembly from parallel NACT 
computations. As with other tomographic approaches, care must be taken in 
selecting the starting model owing to the strong nonlinearity of the inverse prob- 
lem. In ref. 34, we explain how the present line of successive tomographic models 
(SEMum, SEMum2, and now SEMUCB-WM1) started from a one-dimensional 
earth model, and progressively incorporated more and more waveform data and 
shorter periods as iterations proceeded and three-dimensional structure became 
stronger. 

Parameterization and starting model. Model SEMUCB-WM1 is constructed 
starting from model SEMum27! above about 800-km depth and model 
SAW24B16* from below this depth (also used in the SEMum and SEMum2 inver- 
sions to account for lower-mantle structure). We invert for three-dimensional varia- 
tions in Voigt-average isotropic shear-wave velocity (V,) and the radially anisotropic 
parameter € = (V,,/V,,)’, with respect to a one-dimensional mean reference model 
that evolves throughout the inversion. In contrast, the one-dimensional attenuation 
model is fixed to a smoothed version of QL6 (ref. 39). We express perturbations to 
mantle V, and € in 20 cubic b-splines* with variable spacing radially and in spherical 
splines*® laterally, with lateral spacing of nodes of less than 2° for V, and 8° for €. 
Source parameters for each event are kept fixed to those reported in the Global CMT 
catalogue (http://www.globalcmt.org). Tests indicate’ that allowing source perturba- 
tions should not measurably affect the resulting structure at the current resolution. 
From our SEMum2 + SAW24B16 starting model, three inversion iterations were 
performed, while incrementally incorporating more and shorter-period data (see 
‘Data selection and inversion’). We note that the upper-mantle part of the model 
(down to 400 km) has not significantly changed in this process, further validating the 
results described in ref. 4. 

Data selection and inversion. Our approach is described in detail in ref. 4 and 
includes discussions of how we calibrate our crustal model, introduce prior 
information in the data and model space, and assess model performance and 
uncertainties (including resolution and statistical resampling tests). Here we 
briefly recall details of the iterative inversion process, including the addition of 
new waveform data and tests of convergence. 

While SEMum and SEMum2 included only fundamental and overtone mode 
surface waveforms at long periods (T > 60 s), here we include body-waveform data. 
To this end, we assembled a data set comprising full three-component teleseismic 
waveforms, filtered in multiple passbands, allowing us to incrementally incorporate 
higher-frequency body-waveform data: 1, surface-wave passband: cut-off at 400s 
and 60 s (corners at 250 s and 80 s); 2, body-wave passband (filter I): cut-off at 300 s 
and 36 s (corners at 180 s and 45 s); and 3, body-wave passband (filter II): cut-off at 
300s and 32s (corners at 180s and 38s). In addition to incorporating shorter 
periods, we also expanded our data set by incorporating additional events: starting 
from the 203 events used in developing upper-mantle models SEMum and 
SEMum2, we added 70 new events with moment magnitude M,, = 5.8-7.3, chosen 
to be spatially distributed in a complementary manner to the original set. 

The inversion comprised three phases. In phase I, we performed one iteration of 
inversion for whole-mantle structure using the 60-s surface-waveform and 36-s 
body waveform data sets (filter I) picked from the 203 events used in developing 
SEMum and SEMum2, and constraints from surface-wave group-velocity maps 
between 25 s and 150s to enforce consistency with our crustal modelling scheme’. 
Because upper-mantle structure changed very little in this first iteration, aside 
from slightly larger amplitudes following the introduction of the body-waveform 
data, we chose to invert for structure only at depths greater than 300 km in the 
remaining iterations (after first recalibrating the crustal model one last time; see 
ref. 4 for details). In phase II, we introduced the 70 new events. This step involved 
picking the new-event data, as well as reprocessing the older-event data, using the 
spectral-element synthetics from the previous iteration (our waveform-data-pick- 
ing approach selects data on the basis of their similarity to the spectral-element 
synthetics computed in the most recent iteration model). We then performed 
another inversion iteration, again using the 60-s and 36-s filter passbands, but 
now including the new-event data and inverting for structure below 300-km depth 
only. In phase III, we again reprocessed the data from the complete 273-event data 


set, but now using a new shorter-period body-wave passband (filter II). We then 
inverted for structure below 300 km using the 60-s and 32-s data passbands. 

To ensure that our inversion was converging, we determined, after each 
iteration, whether more waveform windows were selected in the subsequent 
data-reprocessing round than would have been selected using spectral-element 
synthetics from the previous-iteration model. By the final iteration, we found only 
small gains in the numbers of selected windows, indicating that the inversion had 
probably converged for the particular passbands considered. We also assessed 
convergence by testing fits to held-out waveforms from ten events not included 
in the inversion, and found that these validation data exhibited fits quite similar to 
that seen for the inversion data*. 

Computational cost. For the four rounds of spectral-element simulations 
required to complete these three phases of inversion (including a last round to 
assess the final fit to the data), the present study required about three million CPU 
(central processing unit) hours, performed on Hopper, a Cray XE6 supercomputer 
at the National Energy Research Scientific Computing Center (NERSC). The 
NACT-based Hessian estimation and Gauss—-Newton model update computations 
were performed on NERSC Edison, a Cray XC30. 

Resolution and model uncertainties. It is well known that standard linear reso- 
lution analyses are strictly valid only for linear problems (or potentially near the 
optimum of a nonlinear problem*’) and may also yield misleading results’. 
However, these remain useful techniques for assessing certain aspects of a tomo- 
graphic inversions, providing insight into potential issues related to data coverage 
(for example, uneven sensitivity or smearing), the role of a priori information in 
constraining model smoothness, and limitations of the chosen model basis. We 
provide an extensive discussion of these aspects of SEMUCB-WM1 in the context 
of resolution analysis in ref. 4; here we focus on analyses specifically relevant to the 
types of conduit-like structures discussed in the main text. 

Recovery of whole- and partial-mantle plumes. One obvious question that can 
be probed using resolution analysis is whether we can reasonably expect to resolve 
the plume-like conduits of the amplitude and scale observed in SEMUCB-WM1. 
In Extended Data Figs 4 and 5, we examine the recovery of synthetic whole- and 
partial-mantle plumes of diameter 1,000 km and 600km, beneath Hawaii and 
Iceland. These synthetic test models have a peak amplitude of —2%, comparable 
to many of the conduits observed in our model, and a cosine-cap lateral amplitude 
profile (which means that the ‘core’ of each plume, exhibiting anomaly strength 
>1%, is only half of its diameter—500 km and 300km, respectively). We also 
examine recovery of plumes truncated at successively greater depths (1,000, 
1,500 and 2,000 km) to assess vertical smearing. Artefacts above the truncation 
depth in the synthetic i models are due to aliasing phenomena associated with the 
radial b-spline basis used to parameterize our model (see above). Overall, we find 
that all whole- and partial-mantle input plumes are recovered quite well beneath 
both Hawaii, with denser data coverage, and Iceland, with comparatively sparser 
coverage (there is a slight difference in amplitude recovery beneath the two). We 
see no evidence of lateral smearing, or (in the case of the truncated plumes) radial 
smearing, nor do we detect notable gaps in recovery. Recovered amplitudes vary as 
a function of depth, with comparatively weaker, but still satisfactory, recovery in 
the less well sampled mid-mantle (about half of the input anomaly strength). 
Furthermore, as noted in the main text, the pattern of amplitude variation with 
depth seen in Extended Data Figs 4 and 5 does not match that of our imaged 
plumes, which often show local amplitude maxima in the mid-mantle (not 
minima, as suggested by these tests). Thus, while we cannot rule out that spatial 
variation in sensitivity contributes in some way to the imaged amplitude distri- 
bution, these results give us confidence that plumes of similar dimension and 
amplitude to those seen in SEMUCB-WM1 should be recoverable. 

In Extended Data Fig. 6, we consider a narrower plume (400 km in diameter) 
spanning from the CMB to 1,000-km depth, with a peak amplitude of —2%. We 
observe that the output structure is at least 800 km in width, but is also significantly 
weaker than the input, exhibiting a maximum amplitude of —0.6% near its base, 
while only reaching —0.3% or —0.4% elsewhere in its core, indicating that an 
actual plume of width less than 400 km would have to be much stronger to be 
properly detected in our model (see further discussion in ‘Estimated excess tem- 
peratures and actual width of plumes’). 

Recovery of ‘hanging’ plume structures. To supplement these analyses, we also 
explored tests using synthetic ‘hanging’ plumes—columnar anomalies extending 
down from the surface into the upper mantle and transition zone. These plumes 
have a diameter of 600 km and again a cosine-cap cross-section, as well as —2% 
maximum amplitude, but are now truncated at depths of 410km or 1,000km 
(Extended Data Figs 7 and 8). This experiment is designed to further assess the 
effect of depth smearing, and again examines two geographic locations: Hawaii, 
with denser, and Iceland, with sparser, data coverage. In general, the retrieved 
output structures are remarkably symmetrical and exhibit the correct depth extent, 
with the exception of the plume truncated at 1,000-km depth beneath Hawaii, 
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which shows a slight eastward-trending band extending to the CMB. We note that 
this artefact is very weak, everywhere less than 0.1% amplitude (that is, at least 20 
weaker than the —2% input amplitude). Furthermore, this artefact in no way 
resembles the plume we image beneath Hawaii in SEMUCB-WM1,; it has a very 
different trend and amplitude profile. Turning now to the results beneath Iceland, 
we again see that retrieved amplitudes are comparatively weaker, presumably 
because of the sparser data coverage, but at the same time we find no anomalous 
band. This observation suggests that while Hawaii is expected to have denser data 
coverage, the data coverage beneath Hawaii might also be more anisotropically 
distributed. Indeed, this would be consistent with the eastward or east-southeast 
trending streaks seen at mid-mantle depths in some travel-time-based local tomo- 
graphic studies in this region. Overall, these results give us further confidence that 
the depth extensions of the plumes we image in SEMUCB-WM1 cannot be attrib- 
uted to smearing. 

Estimated excess temperatures and actual width of plumes. Our resolution tests 
indicate that we can easily resolve a synthetic columnar velocity anomaly over a 
width of 1,000 km and maximum amplitude of 2% (Extended Data Fig. 4). This is 
indeed comparable to the maximum amplitudes of the plume conduits imaged in 
the mid- and lower mantle in our model. If the actual width of a plume is instead 
significantly smaller than 1,000 km, then the average velocity anomaly should be 
correspondingly larger to attain the same imaged amplitudes. For example, if the 
actual plume diameter is 600 km (Extended Data Fig. 5), then the actual velocity 
anomaly would have to be on the order of 4%-5% (as the recovered amplitudes 
in the 600-km-width case are on the order of 0.5 times those obtained for the 
1,000-km-width case). Similarly, an even narrower plume, with a diameter of 
400 km or less (Extended Data Fig. 6), should require a velocity contrast within 
the plume of more than 10% to be detected in our inversion. 

Assuming the velocity anomaly is due to temperature alone, a 2% increase in V, 
translates into about 200K excess temperature (AT) in the upper mantle**°. 
However, partial derivatives of shear velocity with respect to temperature greatly 
decrease with pressure (that is, with depth in the mantle). While they are not 
precisely known at lower mantle conditions, a factor-of-two reduction at mid- 
to-lower mantle depths is a reasonable assumption**”’, translating a 2% increase in 
V, into AT ~ 400 K, and 10% into a most unrealistic AT in excess of 2,000 K. Thus, 
assuming that the relative amplitude recovery in our resolution analyses is rep- 
resentative of reality, and that plumes are purely thermal, it is far more plausible 
that we are correctly resolving broader weaker plumes in the lower mantle than 
poorly resolving very strong narrower ones. 
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Extended Data Figure 1 | Intermodel comparisons. These figures correspond 
to the cross-sections in Fig. 1c and d, oriented normal to the direction of Pacific 
absolute plate motion”. As in Fig. 1, sections are indicated in the inset 

maps, while white and purple circles indicate position along section and 
orientation. Shown are relative shear-wave velocity (V,) anomalies in models 
SEMUCB-WM1 (this study), S40RTS (ref. 46), PRI-S05 (ref. 47), HMSL-S06 
(ref. 48) and GyPSuM (ref. 49), each plotted with respect to its own one- 
dimensional reference (where the latter notion is well defined: see for example 
ref. 48; where defined, the one-dimensional reference is often the global 
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average). Panels a—e correspond to the MacDonald-hotspot-centred view of 
Fig. 1d; panels f-j correspond to the Pitcairn-centred view of Fig. 1c. This 
comparison shows that the five models are broadly compatible with each 
other at long wavelengths. However, in the lower mantle, the MacDonald and 
Pitcairn plumes are much more clearly defined as vertical conduits in 
SEMUCB-WM1, and stand out as the strongest and most continuous low- 
velocity features in the lower mantle in these cross-sections (which span 
almost half of Earth’s circumference). 
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Extended Data Figure 2 | Intermodel comparisons. These comparisons SEMUCB-WM1. Furthermore, while we do observe some degree of 
correspond to Fig. le, f, presented in a similar manner to those in Extended correspondence between the two plumes imaged in SEMUCB-WM1 and some 
Data Fig. 1. We again find that models SEMUCB-WM1, S40RTS, PRI-S05, anomalies also present in PRI-SO5 or HMSL (for example, the plume root at 
HMSL-S06 and GyPSuM are broadly compatible with each other at long the CMB beneath Cape Verde, or the lateral translation of the plume 
wavelengths. However, in the lower mantle, the plumes beneath both Cape around 1,000 km beneath Canary), the unambiguously columnar nature of the 


Verde and Canary are more clearly defined as well isolated vertical conduits in | anomalies imaged in SEMUCB-WM1I stands in stark contrast. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Tanzania 


Tanzania 


dinVs (%) 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 3 | Inter-model comparisons. These cross-sections African LLSVP is more massive, and therefore better resolved in S40RTS, 

are similar to those in Extended Data Fig. 1, but now feature two approximately PRI-S05, HMSL-S06 and GyPSuM than are other plumes. As in Extended Data 
orthogonal sections through the African LLSVP: a-e, traversing from Fig. 2, we again note some degree of similarity between SEMUCB-WM1 
northwest to southeast; f-j, traversing from southwest to northeast. The (a), PRI-S05 (c) and HMSL-S06 (d) below the Cape Verde plume. 
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Extended Data Figure 4 | Linear resolution analysis, examining recovery of 
synthetic whole- and partial-mantle plumes of width 1,000 km beneath 
Hawaii and Iceland. Synthetic plume input models, shown in the upper row of 
panels, have a peak amplitude of —2% and a cosine-cap lateral amplitude 
profile (thus, the effective width above 1% anomaly strength is only 500 km). In 
addition to looking at a whole-mantle plume, we also examine recovery of 
plumes truncated at successively greater depths (1,000 km, 1,500 km and 
2,000 km) to assess vertical smearing. Artefacts seen above the truncation depth 
in the synthetic input models are due to slight aliasing phenomena associated 
with the radial b-spline basis functions used to parameterize our model. 

We find that all four input plumes are recovered quite well beneath both Hawaii 


Max: -1.5% Max: -1.5% 

(centre row), with relatively denser data coverage, and Iceland (bottom row), 
with comparatively sparser coverage—although there is a slight difference in 
amplitude recovery beneath the two (maximum amplitude recovered is shown 
for each panel). Importantly, we see no evidence of lateral (or, in the case of 
the truncated plumes, radial) smearing, nor do we detect significant gaps in 
recovery. However, recovered amplitude does vary as a function of depth, with 
comparatively weaker, although still satisfactory, recovery in the less well 
sampled mid-mantle (of the order of half of the input anomaly strength). For a 
more thorough discussion of the caveats implied by linear resolution analysis in 
the context of our inversion, as well as additional resolution tests, see ref. 4. 
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Extended Data Figure 5 | Linear resolution analysis. Similar to that in 
Extended Data Fig. 4. This analysis again features whole- and partial-mantle 
plumes of peak strength —2%, but now of width 600 km (meaning the effective 
width above 1% anomaly strength is only 300 km). We again find that the 
synthetic input plumes are recovered quite well, with no evidence of lateral or 
radial smearing, as well as no gaps in recovery. At the same time, we find that 
recovered amplitude is poorer than for the larger, 1,000-km-width plumes, 


Max: -0.8% Max: -0.8% 

in some cases recovering amplitudes of the order of one-quarter of the input, 
and we note that there is again a slight disparity in amplitude recovery between 
Hawaii and Iceland. Furthermore, we note that tests using synthetic plumes 
at or below widths of 600 km push the limits of the spherical-spline lateral basis 
functions used in our model—particularly in the upper mantle, where the inter- 
spline absolute distance is larger (although the angular distance remains 
constant). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


400 km width synthetic plume 
Input model (Max: -2.0%) 


Output model (Max: -0.6%) 


dinVs_ -2.0% Hii. Ti +2.0% 


Extended Data Figure 6 | Further linear resolution analysis. This analysis is 
of a 400-km-width plume-like conduit extending from the CMB to 1,000-km 
depth, with a similar lateral profile (a cosine cap) and maximum amplitude 
(—2%) as the test structures in Extended Data Figs 4 and 5. The inherent limits 
of our spherical spline basis prohibit us from representing this narrow conduit 
with sufficient fidelity for the purposes of this test above 1,000 km. Upper 
panel, conduit-like input structure; lower panel, output structure resulting 
from resolution test. We observe that the output structure is at least 800 km in 
width, but is also significantly weaker than the input, exhibiting a maximum 
amplitude of —0.6% near its base, while only reaching —0.3 or —0.4% 
elsewhere in its core. As such, we can infer that the input-structure amplitudes 
would need to be increased by at least 10 X in order to maintain amplitudes near 
—2.0% throughout the majority of the lower mantle. This latter observation 
has implications for effective excess temperature (see Methods). 
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Extended Data Figure 7 | Linear resolution analysis for synthetic ‘hanging’ __ the structures retrieved are quite symmetrical and exhibit the appropriate 


plume input structures in the upper mantle and transition zone. Like depth extent, with the exception of the plume truncated at 1,000 km, which 
those in Extended Data Figs 4 and 5, these plumes have an overall width of shows a weak eastward-trending band extending to the CMB. We note that this 
600 km and a cosine-cap lateral cross-section, as well as —2% maximum artefact is very weak, generally less than 0.1% amplitude, as illustrated in the 
amplitude, but are now cut at 410-km (left panels) or 1,000-km (right panels) bottom panel, where structure below 0.1% is masked (that is, the band is at least 
depth. This experiment is designed to assess the effect of depth-smearing in 20 weaker than the —2% input structure). Furthermore, we note that this 


SEMUCB-WM1. Upper panels, hanging-plume input models. Lower panels, _ feature is not at all like the plume we image beneath Hawaii, as it possesses a 
output models when inputs are placed beneath Hawaii. We note that in general _ very different trend and amplitude profile. 
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Extended Data Figure 8 | Linear resolution analysis for synthetic ‘hanging’ models when inputs are placed beneath Iceland. The retrieved structures are 


plume input structures in the upper mantle and transition zone. This again quite symmetrical and exhibit the appropriate depth extent, although 
figure is similar to Extended Data Fig. 7, but now examines recovery beneath amplitude recovery is slightly less impressive than that observed beneath 
Iceland. Upper panels, hanging-plume input models. Lower panels, output Hawaii (consistent with the results of Extended Data Figs 4 and 5). 
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Extended Data Table 1 | Plumes detected in the lower mantle in model SEMUCB-WM 17, and corresponding hotspots 


Index of Hotspot name | Ranking by | Buoyancy | *He/*He * 

plume (Fig. 4) Courtillot”’ flux”® 

Category 1: "Primary Plume" 

l Afar 4 l high 

2 Canary 2 1 low 

3 Cape Verde 2 1.6 high 

4 Comores 0+ : ? 

5 Hawaii 4+ 8.7 high 

6 Iceland 4+ L4 high 

7 Macdonald at 4.3 high? 

8 Marqueses 2+ 3.3 low 

9 Pitcairn 2+ 3.3 high? 

10 Samoa 4 1.6 high 

1] Tahiti/Society 2 oud high? 

Category 2: "Clearly resolved" 

12 Cameroon O+ ? ? 

13 Caroline 3 2 high 

14 Easter 4+ 3 high 

15 Galapagos a l high 

16 Louisville at 0.9 ? 

17 Reunion 4 1.9 high 

18 St Helena 1 0.5 low 

19 Tristan 3 1.7 low 

20 Kerguelen at 0.5 high 

Category 3: "Somewhat resolved" 

21 Ascension 0+ 1 ? 

22, Azores 1+ 1.1 high? 

23 Bouvet Ly 0.4 high 

24 Crozet/Pr.Edw/ 0+ 0.5 ? 

25 Hoggar 1 0.9 ? 

26 Juan Fernandez a+ 1.6 high 

27 San Felix I+ 1.6 ? 

Not associated with any known hotspot 

Indonesia 

The numbering (column 1) and categories correspond to those in Fig. 4. Plumes are categorized as primary if the corresponding low-velocity conduit in the lower mantle has 8V-/Vs less than — 1.5% for most of the 
depth interval 1,000-2,800 km. These 11 plumes also correspond to regions of the lower mantle where the average velocity reduction over the depth range 1,000-1,800 km is significant at the 2c level (see, for 
example, Supplementary Figs 3 and 4). Clearly resolved plumes correspond to vertically continuous conduits with 8V./V; <—0.5% in the depth range 1,000-2,800 km. Somewhat resolved plumes have vertically 
trending conduits with 8V</V; <—0.5% for most of the depth range 1,000—2,800 km, albeit not as clearly continuous. The only clearly resolved plume in the lower mantle that is not near a hotspotis in Indonesia, 
possibly because it is rising beneath a broad slab. However, it occurs close to a location where high 3He/*He ratios have been observed’. For comparison, we list the corresponding hotspot ranking (column 3)°°, as 
well as the buoyancy flux (column 4) and 2He/“*He ratios (column 5). Question marks indicate no value given in ref. 26. Note that in this previous ranking of hotspots, these estimates of buoyancy fluxand #He/*He 


ratios were used together with the velocity anomaly values in the transition zone (500-km depth) from an older tomographic shear-velocity model**. In contrast, our ranking is based entirely on the continuity of 
broad vertically oriented low-velocity structures across the major part of the lower mantle. Hotspots that do not have any clear expression in the lower mantle in model SEMUCB-WM1 are not listed, namely 
Yellowstone, Juan de Fuca/Cobb and Bowie (see also Fig. 4). 
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All around the globe, humans have greatly altered the abiotic and 
biotic environment with ever-increasing speed. One defining fea- 
ture of the Anthropocene epoch’” is the erosion of biogeographical 
barriers by human-mediated dispersal of species into new regions, 
where they can naturalize and cause ecological, economic and 
social damage’. So far, no comprehensive analysis of the global 
accumulation and exchange of alien plant species between conti- 
nents has been performed, primarily because of a lack of data. Here 
we bridge this knowledge gap by using a unique global database on 
the occurrences of naturalized alien plant species in 481 mainland 
and 362 island regions. In total, 13,168 plant species, correspond- 
ing to 3.9% of the extant global vascular flora, or approximately the 
size of the native European flora, have become naturalized some- 
where on the globe as a result of human activity. North America has 
accumulated the largest number of naturalized species, whereas the 
Pacific Islands show the fastest increase in species numbers with 
respect to their land area. Continents in the Northern Hemisphere 
have been the major donors of naturalized alien species to all other 
continents. Our results quantify for the first time the extent of 
plant naturalizations worldwide, and illustrate the urgent need 
for globally integrated efforts to control, manage and understand 
the spread of alien species. 

The magnitude of impacts caused by alien species on native biota 
and human societies is increasing rapidly’. However, our knowledge of 
the global spread and distribution of naturalized species (that is, alien 
species that form self-sustaining populations in new regions**) is 
still very limited. Nevertheless, there are many presumptions about 
the distributions and patterns of spread of alien species. For example, 
it has frequently been suggested that Old World species have spread 
more widely outside their native ranges than New World species, 
owing to human colonization history or intrinsic evolutionary 


superiority’. It has also been suggested that islands have more alien 
species than mainland areas, among others because of unfilled niche 
space on islands”* or, as shown for birds, a higher introduction effort’. 
Although these hypotheses have been tested for some parts of the 
world?”®, global tests are still lacking. 

Scientific and societal concerns about alien species have led to 
improved documentation of their distributions, and inventories have 
become available for many regions'’. Many of these inventories are 
still incomplete, especially for megadiverse taxonomic groups that are 
difficult to survey, such as invertebrates and microorganisms, and for 
less well-surveyed regions. However, vascular plants are well docu- 
mented because of long histories of exploration. Recently, there have 
been several major efforts to combine inventories of alien species for 
large geographical regions (for example, Delivering Alien Invasive 
Species Inventories for Europe (DAISIE; http://www.europe-aliens. 
org/)) and for those considered to be the most problematic invaders 
globally’. However, a global database of the distribution of all natur- 
alized alien plant species had not yet been built. Such data are essential 
for understanding global naturalization patterns and their underlying 
processes, reporting biodiversity status in terms of essential biodiver- 
sity variables’, and informing environmental managers across polit- 
ical borders via early warning systems. 

Here, we present an analysis of naturalized vascular plant species in 
843 non-overlapping regions (countries, federal states, islands) cover- 
ing ~83% of the Earth’s land surface (Fig. 1). We used a novel data- 
base, Global Naturalized Alien Flora (GloNAF), combined with data 
on the origins of the naturalized species and estimates of the numbers 
of native species per continent, to assess (1) which continents have 
accumulated the largest naturalized floras, and (2) which have been 
the major donors of naturalized alien plant species to other parts of 
the world. 
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Figure 1 | Naturalized vascular plant species in the 843 regions covered by 
the GloNAF database. The heat-map colours correspond to the number of 
naturalized species in each of the regions (including 362 island regions). Areas 
permanently covered by ice sheets are indicated in hatched cyan. Grey areas 
indicate regions lacking naturalized plant data. To allow comparisons between 
the sizes of the GloNAF regions, we used a Mollweide equal-area projection. 
However, to increase the visibility of small islands and island groups on the 
map, they are represented by circles. 


We found that at least 13,168 vascular plant species have become 
naturalized in at least one of the 843 regions (including 362 islands) 
(Fig. 1). As there were no data available for approximately 17% of the 
Earth’s land area, particularly in temperate Asia (Fig. 1), and some of 
the regional inventories used might not be fully comprehensive, the 
actual number is likely to be even higher. This means that at least 3.9% 
of all currently known vascular plant species on Earth (n = 337,137; 
http://www.theplantlist.org/) have become naturalized outside their 
natural ranges as a result of human activity. With continuing globali- 
zation and increasing international traffic and trade, it is very likely 
that more species will be introduced outside their natural ranges and 
naturalize. 

To assess which continents have accumulated the highest number of 
naturalized species, we assigned each of the GloNAF regions to the 
nine major biogeographically defined areas recognized by the 
Biodiversity Information Standards (also known as the Taxonomic 
Databases Working Group (TDWG)"; Fig. 2a). Since the areas of 
the TDWG continental scheme (further referred to as TDWG con- 
tinents) differ significantly in size, we created accumulation curves of 
naturalized species to allow comparisons of the number of naturalized 
plants per continent for equal areas’*. When ignoring differences in 
total area, North America has the highest cumulative number of nat- 
uralized species (n = 5,958), followed by Europe (n = 4,140; Fig. 2b). 
Although the rich naturalized floras of these continents could partly 
reflect a higher sampling intensity in these continents, it is likely that 
they also reflect a higher introduction effort. Both continents have 
dominated international trade for centuries, and many plants have 
been intentionally introduced from other continents for agricultural 
and horticultural purposes'*””. 

Although North America has a longer history of European coloniza- 
tion than Australasia, it received only slightly more naturalized species 
from outside the continent (3,513) than the latter (3,371; Fig. 2c). 
However, Australasia has even more such extra-continental species 
than North America when taking into account area differences 
(Fig. 2c). One possible explanation is that Australia’s long biogeogra- 
phical isolation and drying climate have resulted in a native flora that is 
phylogenetically distinct'*, but not well-adapted to exploit the novel 
habitats created by European settlers. These new habitats have instead 
been occupied by many incoming alien plant species. 

When only extra-continental arrivals are considered, Europe drops 
to fifth position, just behind Africa (Fig. 2c). Thus, although many 
plants from other continents have been introduced into Europe’””’, 
few of them have naturalized. One explanation might be that plants 
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Figure 2 | Naturalized species-accumulation curves for the major 
biogeographical areas. a, Map of the nine TDWG continents. Hatched areas 
indicate major permanent ice sheets. b, Naturalized species-accumulation 
curves (1,000 random draws) for each of the nine continents. c, Same as b but 
here naturalized species are restricted to extra-continental aliens only. The 
colours in b and c correspond to the colours of the continents in a. Vertical and 
horizontal dashed lines mark the total area of the continent and its total number 
of naturalized plants, respectively. To increase visibility, thicker lines were used 
for Pacific Islands and Antarctica. 


that spread through Europe with agriculture several thousand years 
ago (so-called archaeophytes), and European species that naturalized 
within the continent more recently, have already occupied many of the 
vacant niches, preventing many extra-continental species from nat- 
uralizing. In addition, extra-continental species might be relatively 
maladapted to the human-dominated environments in Europe, com- 
pared with species already present there, which have a longer evolu- 
tionary history of growing in these environments®. 
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The Pacific Islands show the steepest increase in cumulative number 
of naturalized species with area (Fig. 2). Therefore, our data provide 
the first global test illustrating that oceanic islands harbour more nat- 
uralized alien plants than similarly sized mainland regions, a phenom- 
enon that is attributed to the available niche space not being saturated 
by native species*” or to a higher number of introductions. Given the 
high concentration of endemic species on most oceanic islands”', the 
great richness of naturalized species on these islands constitutes a 
serious threat to global biodiversity. 

TDWG continents with large tropical regions (Africa, South 
America and tropical Asia) have, overall, fewer naturalized alien spe- 
cies than the predominantly temperate continents (North America, 
Europe and Australasia). This is consistent with previous observations 
suggesting a higher resistance of tropical regions to the establish- 
ment of alien species because of fewer available free ecological niches, 
faster recovery of vegetation after disturbance or a lower introduction 
rate’’”*, Temperate Asia, in contrast, shows a very low rate of accu- 
mulation of naturalized species with area. Unlike other continents, 
most of temperate Asia has not been colonized by Europeans 
(http://commons.wikimedia.org/wiki/Atlas_of_colonialism), and large 
parts of it have only recently opened up to inward movements of people 
and plants”. With the recent rise of China as a major trade partner, we 
might expect a rapid increase of naturalized species in temperate Asia in 
the coming decades. 

To identify the major donor continents of naturalized alien plant 
species, we assigned each naturalized species to its native continent(s). 
On the basis of estimated numbers of native species per continent, one 
would expect the most species-rich TDWG continents (South America 
and tropical Asia) to be the main donors of naturalized plant species 
(Fig. 3a); but they are not. The observed flow of naturalized plant 
species clearly shows that temperate Asia and Europe are the major 
donors (Fig. 3b). Although temperate Asia is ahead of Europe in abso- 
lute numbers, the observed number of species native to Europe and 
naturalized elsewhere is 288% higher than expected, but only 52% 
higher than expected for temperate Asia (Extended Data Fig. 1 and 
Extended Data Table 1). Furthermore, North America is also over- 
represented, with 57% more species donated than expected (Extended 
Data Fig. 1). In contrast, the TDWG continents that are largely in the 
Southern Hemisphere are all underrepresented as donors (Extended 
Data Fig. 1). These results are robust against potential over- or under- 
estimates of the number of native species per continent (see Extended 
Data Table 1 fora sensitivity analysis). This suggests that the tradition- 
ally acknowledged Old World versus New World dichotomy in 
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Figure 3 | Flows of naturalized alien plant species among the TDWG 
continents. a, Expected flows (medians of 999 random draws) of naturalized 
species on the basis of estimated numbers of native species (in brackets). 

b, Observed flows of naturalized species. The continents are ordered according 
to decreasing importance as sources. Only the 50% most important flows are 
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biological invasions®’ needs to be replaced by a Northern Hemi- 
sphere versus Southern Hemisphere dichotomy for the donor conti- 
nents of naturalized alien plants globally. Darwin® suggested that 
Northern Hemisphere species, as a consequence of a more competitive 
evolutionary history, are intrinsically better competitors than 
Southern Hemisphere species, and that this could explain their nat- 
uralization success. To determine whether this is indeed the case 
requires further research. Nevertheless, the fact that the Southern 
Hemisphere is currently underrepresented as a donor might also indi- 
cate that the southern continents still harbour many species that could 
potentially spread to northern continents when given the chance. 

For six of the nine TDWG continents, the observed intra-contin- 
ental flows were larger than expected (Fig. 3 and Extended Data Fig. 1). 
Because of the shorter distances, intra-continental propagule pressure 
can be assumed to have been larger, and because of environmental 
similarity, subsequent naturalization chances are higher for intra- 
continental alien species**. Notable exceptions with fewer than 
expected intra-continental naturalizations were South America and 
tropical Asia. We argue that because many species from these conti- 
nents have restricted ranges—reflected in relatively high levels of 
regional endemism7'—species from tropical Asia and South America 
are less likely to have been dispersed outside their native ranges. 

The recently compiled GloNAF database has enabled the most 
comprehensive analysis so far of the global distributions of naturalized 
alien plant species, and provides the first robust estimates of the flows 
of naturalized plant species worldwide. We reveal striking differences 
within and among continents in the sizes of their naturalized alien 
floras, rates of accumulation of naturalized species with respect to area, 
and relative importance as exporters of naturalized species. Humans 
have strongly shaped the geographical composition and global 
distribution of alien plants among the World’s continents, with the 
Northern Hemisphere being the major donor. The Pacific Islands and 
Australasia harbour the highest numbers of naturalized alien species, 
given their sizes and the extent of naturalization of species from other 
continents. The GloNAF database and the robust large-scale patterns 
we reveal here provide a vital foundation for testing fundamental 
hypotheses to understand plant naturalization better. For example, 
when combined with native plant inventories and phylogenetic data, 
the database will allow quantification of the degree of global floristic 
homogenization and tests to determine whether naturalized species 
are more closely or more distantly related to native species”. In addi- 
tion, the global baseline data of plant naturalizations provided here 
might contribute an essential biodiversity variable needed to monitor 


south America 


Pouauny YJON 


shown. Ant., Antarctica (n = 293 native species); C, only known from 
cultivation or novel hybrids (n = 97 species). Each tick along the outer circle 
corresponds to 1,000 species. Left (white) parts of inner bars along the circle 
represent flows of imported species; right (coloured) parts represent exported 
species. 
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changes in global biodiversity’, and can inform evidence-based man- 
agement of alien species. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data compilation. No statistical methods were used to predetermine sample size. 
The GloNAF database includes inventories of naturalized alien plant species (also 
infraspecific taxa and hybrid taxa) for 843 regions worldwide. The data sources 
that we used (see Supplementary Information) include naturalized alien plant 
compendia, national and subnational lists of naturalized alien plant species pub- 
lished in scientific journals, as books or on the internet, as well as books and online 
compendia of national or subnational floras with information on which species 
occur in the wild but are not native. Our database also includes unpublished 
inventories of naturalized alien species that were specifically compiled for the 
GloNAF database (for example, for the provinces of China and the states of 
India). We consider those alien species that have established self-sustaining popu- 
lations without direct human intervention to be naturalized, following refs 4 and 
27. The GloNAF database will be fully publicly available after finalizing funded 
GloNAF projects (Deutsche Forschungsgemeinschaft and Austrian Science Fund 
FWE), which are due in about 3 years. 

As certain regions of the world are more intensively researched than others, it is 
unavoidable that some of the regional inventories of naturalized alien species are 
more comprehensive than others. We aimed to include the most comprehensive 
and most recent regional inventories. Indeed, more than 95% of the data sources 
are from the past two decades (see Supplementary Data). Moreover, since some of 
the original source lists included alien species that are cultivated only or have non- 
persistent populations in the wild, we excluded those species whenever such 
information was provided, or contacted experts of the regional floras to remove 
species of doubtful naturalization status. Furthermore, for European countries that 
differentiated between archaeophytes (alien species that came before the year 
1492) and neophytes (species that came after the year 1492), we kept only the 
latter, because the alien status of some species classified as archaeophytes is dis- 
puted; moreover, this classification is not available for other regions of the world, 
and thus would prevent us from achieving a balanced/standardized assessment of 
naturalized alien species numbers. 

To standardize scientific names, each naturalized plant inventory was com- 
pared with The Plant List (http://www.theplantlist.org/), the most comprehensive 
working list of all plant species”*. This taxonomic standardization was done with 
the help of the R” package Taxonstand*. For each species, we kept the name 
accepted by The Plant List. Species that were not found in The Plant List, also not 
after accounting for spelling differences, were kept in the database using the names 
as used in the source data. In total, the database includes 13,168 species of which 
13,033 are recognized by The Plant List (12,498 as accepted and 535 as unresolved 
names). The remaining 135 species do not occur in The Plant List, and among 
those 11 are ornamental cultivars. 

For each species in the database, we compiled data on which of the nine regions 
of the TDWG continental scheme (TDWG continents") it was native to, or 
whether it was known only from cultivation or resulted from hybridization 
between two alien species or an alien and a native species. Most of the native- 
range data were extracted from the World Checklist of Selected Plant Families 
(WCSP; http://apps.kew.org/wcsp/), and supplemented with data from the 
Germplasm Resources Information Network (http://www.ars-grin.gov/cgi-bin/ 
npgs/html/index.pl). For the approximately 4,000 species that were not included 
in these two major data sources, we retrieved information on the native regions 
from printed floristic compendia, extensive internet searches and comparisons of 
their naturalized distributions to their overall distributions in the Global 
Biodiversity Information Facility (http://www.gbif.org/). Information about native 
continents was found for 13,070 species, of which 219 are only known from 
cultivation and 51 are novel hybrids. Many (5,646) species were native to more 
than one continent. For the few (98) remaining species, we could not find any 
information on their native ranges. 

Each of the 843 regions covered by GloNAF was assigned to one of the nine 

TDWG continents. We calculated the area of each region while considering only 
the ice-sheet-free areas of each region, ranging from 0.03 to 2,486,952 km?, with a 
median of 18,725 km”. 
Accumulation of naturalized species per continent. To determine which con- 
tinent accumulated the highest number of naturalized species for a certain area, we 
constructed species-accumulation curves’* separately for each of the nine TDWG 
continents. Since choosing a starting region and the order of adding remaining 
regions to the species-accumulation curves would be arbitrary, we used a random 
order of regions, and repeated this procedure 1,000 times. Species-accumulation 
curves were calculated for all alien species and for extra-continental alien species 
separately. This analysis was done in the R package vegan*’. 


Flows of naturalized alien species among continents. To test whether the 
observed flows of naturalized species from donor continents to recipient conti- 
nents were larger or smaller than expected, we compared the observed flows with 
those based on random draws from the extant global flora. Since no data on the 
number of native species per TDWG continent exist, we first estimated these 
numbers by extrapolation of the known native origins of 130,641 accepted vascular 
plant species in the WCSP (http://apps.kew.org/wcsp/) to the total number of 
337,137 accepted species in The Plant List (http://www.theplantlist.org/). 
Although the WCSP includes quite a large proportion (38.8%) of all vascular plant 
species, it does not include all vascular plant families yet, and it might be geo- 
graphically biased. However, ref. 32 showed that all 52 TDWG level-2 regions, and 
thus the TDWG continents also, are well represented in the WCSP. Furthermore, 
our estimates did not deviate much from published estimates we found for some of 
the continents: our estimate of 62,193 native species for Africa is close to the 
previously estimated 40,000-60,000 for the African mainland*’, and the 64,500 
species listed in the African Plants Database (http://www.ville-ge.ch/musinfo/bd/ 
cjb/africa/). Our estimate of 14,148 native species for Europe is slightly higher 
than the 12,517 native species listed in the Flora Europaea*’. Our estimate of 
30,054 native species for North America is higher than the 21,500 species listed 
in the Biota of North America Program (http://www.bonap.org/), but the latter 
does not include species of Mexico. Our estimate of 22,891 native species 
for Australasia is higher than the 19,324 reported for Australia by the Austra- 
lian National Herbarium (https://www.anbg.gov.au/aust-veg/australian-flora- 
statistics.html), but the latter does not cover all parts of Australasia (for example, 
New Zealand). Therefore, although our estimates of the native species richness of 
each continent are higher than previous estimates, these differences seem to result 
mainly from additional regions included in TDWG continents and gaps in the 
other data sources. Thus our results appear to be realistic proxies for the true 
numbers of continental species richness. 

To obtain the expected flows of species from donor to recipient continents, we 
first created a species pool with a size equal to that of the extant global vascular- 
plant species pool (n = 337,137), in which the proportion of species native to each 
continent or combination of continents was based on the estimated native species 
richness of the continents. Then, for each recipient continent, we drew separately a 
random sample of species from the extrapolated global species pool. The size of the 
random sample was equal to the number of naturalized alien species observed in 
the recipient continent. We then recorded the number of randomly drawn species 
native to each continent or belonging to the pool of species known from cultivation 
or as novel hybrids. This random-draw procedure was repeated 999 times, and the 
medians are shown in Fig. 3a. We did this for each recipient continent separately to 
allow for the fact that a species can be naturalized in more than one continent. If 
the observed flow of species from a donor continent to a recipient continent was 
within the upper 2.5% of the random distribution, we considered the observed flow 
to be significantly larger than expected by chance; if the observed flow was within 
the lower 2.5% of this distribution, we considered the flow to be significantly lower 
than expected by chance. Since we might have over- or underestimated the native 
species richness for some continents, we also did a sensitivity analysis by decreas- 
ing and increasing the size of the native flora of each continent by 10% in turn (see 
Supplementary Information). R syntax for the random draws is available from the 
corresponding author on request. Flow plots were created using an R syntax 
adapted from ref. 35. 
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Extended Data Figure 1 | Observed and expected numbers of naturalized 


species from each donor TDWG continent in each of the recipient TDWG 


continents. Histograms of the expected numbers are shown in black open 
bars, and are based on 999 random draws from the global flora (n = 337,137 
The observed numbers are shown as vertical lines; blue, significantly fewer 


). 


Number of species 


observed naturalized species from the source continent than expected (in the 
lower 2.5 percentile); red, significantly more observed naturalized species 
than expected (in the upper 2.5 percentile); black, the observed number of 
naturalized species is within the central 95% range of the expected numbers. 
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Extended Data Table 1 | Results of sensitivity analysis for observed and expected numbers of naturalized species from each donor continent 
in each of the recipient continents 


Recipient continent 
South America Tropical Asia Asia North America Australasia Pacific Islands 
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For each combination of two TDWG continents, the table gives the observed number (No.) of species that are native to the donor continent and have become naturalized in the recipient continent in bold 
type. Below, each observed number is the median of the expected number on the basis of 999 random draws from the global vascular flora (n = 337,137). Below this median, the minimum and maximum 
median values of the expected numbers found during the sensitivity analysis are given in sloping type. In addition, the table gives the proportion (P) of the 999 random draws for the expected values that were 
smaller than the observed values. The minimum and maximum proportions found during the sensitivity analysis are given in sloping type. Proportions >0.975 (the source is overrepresented in the recipient 
continent) are given in red; proportions <0.025 (the source is underrepresented) are given in blue. 
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Genetic evidence for two founding populations 


of the Americas 


Pontus Skoglund’, Swapan Mallick)**, Maria Catira Bortolini*, Niru Chennagiri'”, Tabita Hiinemeier?, 
Maria Luiza Petzl-Erler®, Francisco Mauro Salzano*, Nick Patterson? & David Reich!” 


Genetic studies have consistently indicated a single common origin of 
Native American groups from Central and South America’. 
However, some morphological studies have suggested a more com- 
plex picture, whereby the northeast Asian affinities of present-day 
Native Americans contrast with a distinctive morphology seen in 
some of the earliest American skeletons, which share traits with pre- 
sent-day Australasians (indigenous groups in Australia, Melanesia, 
and island Southeast Asia)**. Here we analyse genome-wide data to 
show that some Amazonian Native Americans descend partly from a 
Native American founding population that carried ancestry more 
closely related to indigenous Australians, New Guineans and 
Andaman Islanders than to any present-day Eurasians or Native 
Americans. This signature is not present to the same extent, or at 
all, in present-day Northern and Central Americans or in a ~12,600- 
year-old Clovis-associated genome, suggesting a more diverse set of 
founding populations of the Americas than previously accepted. 

All Native American groups studied to date can trace all or much of 
their ancestry to a single ancestral population that probably migrated 
across the Bering land bridge from Asia more than 15,000 years ago’, 
with some Northern American and Arctic groups also tracing other 
parts of their ancestry to more recent waves of migration*®"®. Ancient 
genomic evidence has shown that this so-called ‘First American’ 
ancestry is present in an individual associated with Clovis technology 
from North America dating to ~ 12,600 years ago’, and mitochondrial 
DNA has suggested that it was also present by 13,000-14,500 years 
ago’'””, In contrast, some morphological analyses of early skeletons in 
the Americas have suggested that characteristics of some Pleistocene 
and early Holocene skeletons fall outside the variation of present-day 
Native Americans and instead fall within the variation of present-day 
indigenous Australians, Melanesians and so-called ‘Negrito’ groups 
from Southeast Asia (and some sub-Saharan African groups)’. 
This morphology has been hypothesized to reflect an_ initial 
‘Paleoamerican’ pioneer population in the Americas, which according 
to some interpretations was largely replaced by populations with 
Northeast Asian affinities in the early Holocene, but may have per- 
sisted in some locations’***. However, morphological similarity can 
arise not only through shared descent but also through convergent 
evolution or phenotypic plasticity coupled with similar environ- 
ments’*’”, Another limitation of morphological data is that it provides 
very few independent characters that can be analysed. Genome-wide 
data, with its hundreds of thousands of independent characters that 
evolve effectively neutrally, should be a statistically powerful and 
robust way to test whether a distinct lineage contributed to Native 
Americans. 

Analysis of population history in the Americas is complicated by 
post-Columbian admixture from mainly European and African 
sources’. We identified 63 individuals without discernable evidence 
of European or African ancestry in 21 Native American populations 
genotyped at ~600,000 single nucleotide polymorphisms (SNPs) on 


the Affymetrix Human Origins array'*’’ (Extended Data Fig. 1 and 
Supplementary Information section 1). We further restricted our stud- 
ies to individuals from Central and South America that have the 
strongest evidence of deriving entirely from a homogeneous First 
American ancestral population’. We computed all possible f,-statistics 
of the form f,(American,, American; outgroup,, outgroup2), the prod- 
uct of the allele frequency differences between the two American 
groups and the two outgroups. We represented the Americans by a 
panel of 7 Central and South American groups, and the outgroups by 
24 populations (4 from each of 6 worldwide regions). If the two Native 
American groups descend from a homogeneous ancestral population 
whose ancestors separated from the outgroups at earlier times, it fol- 
lows that the difference in allele frequencies between Native American 
populations will have developed entirely after their separation from 
the outgroups, and so the correlation in allele frequency differences is 
expected to be zero. To evaluate whether all possible f,-statistics com- 
puted in this way are consistent with zero, correcting for multiple 
hypothesis testing due to the large number of statistics examined, 
we measured the empirical covariance of the matrix of f,-statistics 
using a block jackknife’’, and performed a single Hotelling’s T” test? 
for consistency with zero. We reject the null hypothesis at high sig- 
nificance (P=2%X10~’), suggesting that the analysed Native 
American populations do not all descend from a homogeneous ances- 
tral population since separation from the outgroups (Extended Data 
Table 1 and Supplementary Information section 2). The coefficients 
for which non-American populations contribute the most to the sig- 
nals separate Native Americans into a cline with two Amazonian 
groups (Surui and Karitiana) on one extreme and Mesoamericans 
on the other (Extended Data Fig. 2). Among the outgroups, the most 
similar coefficients to Amazonian groups are found in Australasian 
populations: the Onge from the Andaman Islands in the Bay of Bengal 
(a so-called “Negrito’ group), New Guineans, Papuans and indigenous 
Australians (Supplementary Information section 2). 

We extended our analysis to 197 non-American populations 
sampled worldwide’**°. We computed D-statistics”’ to test whether a 
randomly drawn derived allele from each worldwide population has an 
equal probability of matching a randomly drawn Mesoamerican or 
Amazonian chromosome at sites where these differ. This test takes as 
its null hypothesis the tree-like population history (Test population, 
(Mesoamericans, Amazonians)), and produces a positive D-statistic 
only in the case of excess affinity between the test population and 
Amazonians (negative values in the case of an excess affinity with 
Mesoamericans). Consistent with the signals observed when many 
populations are analysed together, we find that Andamanese Onge, 
Papuans, New Guineans, indigenous Australians and Mamanwa 
Negritos from the Philippines all share significantly more derived 
alleles with the Amazonians (4.6 > Z > 3.0 standard errors (s.e.) from 
zero) (Extended Data Table 2). No population shares significantly more 
derived alleles with the Mesoamericans than with the Amazonians. We 
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find consistent results for this test not only for Onge, Papuans, New 
Guineans and indigenous Australians as representatives of Australasian 
populations, but also for different outgroups in place of chimpanzee: 
Africans, Europeans and East Asians (2.8 < Z < 4.8) (Supplementary 
Information section 3). In Fig. 1, we show a quantile-quantile plot of 
D-statistics contrasting the Mesoamerican Mixe and the Amazonian 
Surul, revealing Australasian populations as the only discernible outliers. 

We replicated the significant evidence for affinity between 
Australasians and Amazonians using D-statistics computed on 
Illumina SNP array data’ (as an alternative to the Affymetrix Human 
Origins SNP array data) (2.6 < Z < 3.0) and on high-coverage genome 
sequences from 3 Yoruba, 2 Surui, 3 Mixe and 16 Papuans (18 of these 
genomes are reported for the first time here’; Table 1) (Z = 4.3). In 
addition to the three independent molecular experiments that these 
data sets represent, we find consistent results for all different mutation 
classes in the high-coverage genomes (2.6 <Z< 4.3), and different 
ascertainment schemes (for example, in polymorphisms discovered 
in Africans, New Guineans and East Asians) (Supplementary 
Information section 3) (1.1<Z< 3.3 for panels with >20,000 SNPs). 
Wealso find consistent results for two differently genotyped subsets of 
Surui individuals from a total of 24 individuals’ (Table 1 and Extended 
Data Fig. 3a) (2.6 < Z < 3.6). Simulations (Supplementary Information 
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section 3) show that genotype and sequence errors cannot explain the 
magnitude of the observed signal (Extended Data Fig. 3b). Finally, we 
generated new data from 9 populations from present-day Brazil using 
the Affymetrix Human Origins array, including previously untested 
individuals from the Amazonian Surui and Karitiana for which DNA 
was extracted from blood. These new samples replicate the signal, and 
furthermore show that the signal is also strong in the Xavante 
(1.3 <Z< 3.25), a population of the Brazilian Central Plateau which 
speaks a language of the Ge group that is different from the Tupi 
language group to which the languages of the Karitiana and Surui both 
belong. We do not detect any excess affinity to Australasians in the 
~12,600-year-old Clovis-associated Anzick individual from western 
Montana (Z = —0.6) (Supplementary Information section 3). 

To test if the significant D-statistics have the patterns expected for a 
genuine admixture event, we stratified the high coverage genomes into 
deciles of ‘B-values’*, which measures proximity to functionally 
important regions. Genuinely significant D-statistics are expected to 
be of larger magnitude closer to genes, as selection increases variability 
in fitness of haplotypes near functionally important regions, which in 
turn increases the genetic drift in these regions and the absolute mag- 
nitude of D-statistics**”°, a prediction that we confirmed empirically 
(Extended Data Fig. 4). We computed D(Yoruba, Papuan; Mixe, 
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Figure 1 | South Americans share ancestry with Australasian populations 
that is not seen in Mesoamericans or North Americans. a, Quantile—quantile 
plot of the Z-scores for the D-statistic symmetry test for whether Mixe and 
Surui share an equal rate of derived alleles with a candidate non-American 
population, X, compared to the expected ranked quantiles for the same number 
of normally distributed values. b, Z-scores for the h4-statistic. c, Z-scores for 


Papuans 


the ChromoPainter statistic. d, Heatmap of ChromoPainter statistics. For non- 
Americans we display the symmetry statistic S(non-American; Mixe, Surui 
and Karitiana) for donating as many haplotypes to Mixe as to Surui and 
Karitiana. For the Americas we plot S(Onge; Mixe, American) for receiving 
as many haplotypes from the Onge as do the Mixe. 
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Table 1 | Statistics testing the consistency of the tree (Yoruba, (Papuan, (Mixe, Surui))) with the data 


Test statistic Z-score Informative loci 

High-coverage genomes 0.0211 4.26 798,873 
A/T SNPs 0.0169 2.63 60,538 
A/G SNPs 0.0191 3.64 268,962 
A/C SNPs 0.0208 3.49 67,210 
G/T SNPs 0.0248 4.27 67,623 
C/T SNPs 0.0220 4.24 270,133 
C/G SNPs 0.0248 4.26 64,951 
Illumina array Surui samples from HGDP 0.0076 2.63 247,814 
Illumina array Suruf samples not in HGDP 0.0081 3.02 249,941 
Affymetrix Human Origins array (Surui cell lines) 0.0099 3.63 318,544 
Affymetrix Human Origins array (Surui blood samples) 0.0072 257 313,349 
ha-statistic (Affymetrix Yoruba ascertainment) 0.0003 4.60 14,938 
Chromosome painting symmetry test 0.0026 5.26 - 


Note: except for the new hg statistics and chromosome painting symmetry tests which are explicitly noted, all statistics are D-statistics*’. Z-scores were obtained by computing standard errors using a weighted 


block jackknife. 


Surul) separately for each bin, and found that it is of larger magnitude 
close to functionally important regions (Extended Data Fig. 4) 
(Z = —2.0 for the slope of a linear regression model), as expected for 
a real admixture event. A caveat is that when we formally combine the 
evidence from the genome-wide D-statistic and the correlation to the 
B-value, the significance (Z = 3.6 s.e. from 0) is not any greater than for 
the basic D = 0.021 + 0.005 statistic (Z = 4.2 s.e. from 0) because the 
two statistics co-vary. Nevertheless, the fact that the correlation with 
B-values is significant by itself and in the expected direction adds to the 
qualitative evidence for an admixture event. 

Alternative approaches for testing for admixture involve detecting 
admixture linkage disequilibrium in a test population that is correlated 
to allele frequency differentiation between two populations that are 
related to the sources”””*. We devised a statistic ‘h,’ that is analogous to 
an f4-statistic, but instead of studying allele frequencies, it tests whether 
the linkage disequilibrium patterns of two populations are consistent 
with descending from a common ancestral population since separa- 
tion from two outgroups. A classic statistic for measuring linkage 
disequilibrium in a population A is H4 = p4, — pp4, which measures 
the extent to which a haplotype of two derived mutations occurring 
at frequency pi}, is observed more or less frequently than would be 
expected from the individual frequencies of alleles 1 and 2 (pt and P2)- 
Thus, we define h,(A, B; C, D) as the average of (H4— H®)(H°—H?) 
across the genome, and view a deviation from zero as evidence against 
the unrooted tree ((A, B), (C, D)). We used loci ascertained as poly- 
morphic in African Yoruba, which is effectively an outgroup to the 
other populations analysed here, to test h4(Yoruba, X; Mixe, Surut) for 
all SNP pairs within 0.01 centimorgans (cM) and for a large set of 
worldwide non-African populations, and obtained normalized 
Z-scores by estimating the number of standard errors this quantity 
is from zero using a block jackknife. Although Z-scores computed for 
most of 120 non-Americans and non-Africans as population X con- 
form to a normal distribution (Fig. 1b), we again found significant 
evidence of excess affinity of the Surui to Australasian populations 
(Z=5.7, P= 10 ° for New Guineans; Z = 4.6, P= 10 ° for Papuans; 
Z=4A, P=10° for Andamanese). When we exclude the 
Australasians, we detect no evidence of correlation between 
Z-transformed h,- and f,-statistics for the remaining 114 populations 
(R = —0.026) suggesting that h, can provide evidence independent of 
allele frequency based statistics. Although h, can theoretically be 
biased by loss of polymorphism due to bottlenecks (Supplementary 
Information section 4), there is no evidence that this is a problem for 
our analysis as East Asian and Siberian populations with comparable 
loss of polymorphism do not show an affinity to Amazonians by this 
statistic (Extended Data Fig. 5). In addition, there is a high degree of 
correlation between significant h4- and D-statistics in empirical data 
(Extended Data Fig. 5). Computing h,(Yoruba, Onge; Mixe, Surui) 
over windows of increasingly large genetic distances reveals that it 
dissipates at approximately 0.2 cM. This is an order of magnitude 
smaller than linkage disequilibrium caused by admixture events at 
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the ~4,000 year upper limit of previous methods", but at a larger scale 
than the signal of admixture between Neanderthals and non-Africans 
37,000-86,000 years ago” (Extended Data Fig. 5). 

Asa third population symmetry test, we applied a method for detect- 
ing shared haplotypes between individuals (‘chromosome painting’) to 
infer in each Native American individual which non-American chro- 
mosome segment each American chromosome segment shares the clos- 
est affinity to, using a set of 174 non-American populations as references. 
We then performed a symmetry test for a candidate population sharing 
more haplotypes with a given non-American population than the 
Mesoamerican Mixe do, performing a block jackknife across all chromo- 
somes (weighting to correct for variation in chromosome length) to 
assess uncertainty. We find that the blood and cell line Surui are signifi- 
cantly closer to the Onge than the Mixe are (Z = 5.3) (Fig. 1c), as are the 
blood and cell line Karitiana samples (Z= 4.2 to 5.0), the Xavante 
(Z = 4.3), and the Piapoco and Guarani (Z > 3) (Fig. 1d). In contrast, 
populations from west of the Andes or north of the Panama isthmus 
show no significant evidence of an affinity to the Onge (Z<2). An 
exception to this is the Cabecar, who have previously been shown to 
be partially admixed from a source south of the Panama isthmus’. 

The geographic distribution of the shared genetic signal between 
South Americans and Australasians cannot be explained by post- 
Columbian African, European or Polynesian gene flow into Native 
American populations. If such gene flow produced signals strong 
enough to affect our statistics, our statistics would show their strongest 
deviations from zero for African, European or Polynesian populations, 
which is not observed. For example, a direct test is significant in 
showing that the Surui-specific ancestry component is genetically clo- 
ser to the Andamanese Onge than to Tongans from Polynesia 
(D = 0.0094, Z = 3.4). 

To investigate models consistent with the data, we studied admix- 
ture graph models relating the ancestry of Native American groups to 
Han Chinese and Onge Andaman Islanders, incorporating a prev- 
iously described admixture event into Native American ancestors from 
a lineage related to an ~24,000-year-old Upper Paleolithic individual 
from Mal’ta in Siberia* (denoted as MA1). We are unable to fit 
Amazonians as forming a clade with the Mesoamericans, or as having 
a different proportion of ancestry related to Mal’ta or present-day East 
Asians. Thus, our signal cannot be explained by lineages that have 
previously been documented as having contributed to Native 
American populations. However, we do find that a model where 
Amazonians receive ancestry from the lineage leading to the 
Andamanese fits the data in the sense that the predicted f,-statistics 
are all within two standard errors of statistics computed on the empir- 
ical data (Extended Data Figs 6 and 7 and Extended Data Table 3). 
These results do not imply that an unmixed population related 
anciently to Australasians migrated to the Americas. Although this 
is a formal possibility, an alternative model that we view as more 
plausible is that the ‘Population Y’ (after Ypykuéra, which means 
‘ancestor’ in the Tupi language family spoken by the Surui and 


©2015 Macmillan Publishers Limited. All rights reserved 


Africans MA‘ 


Pima Mixe Xavante Surui Karitiana Han 


Figure 2 | A model of population history that can explain the excess 
affinity to Oceanians observed in Amazonian populations. a, We fit an 
admixture graph model where a population related to the Andamanese 
Onge contributed a fraction « of the ancestry of ‘Population Y’, which later 
contributed a fraction y to the ancestry of Amazonian groups today 


Karitiana) that contributed Australasian-related ancestry to Amazonians 
was already mixed with a lineage related to First Americans at the time it 
reached Amazonia. When we model such a scenario, we obtain a fit for 
models that specify 2-85% of the ancestry of the Surui, Karitiana and 
Xavante as coming from Population Y (Fig. 2). These results show that 
quite a high fraction of Amazonian ancestry today might be derived from 
Population Y. At the same time, the results constrain the fraction of 
Amazonian ancestry that comes from an Australasian related population 
(via Population Y) to a much tighter range of 1-2% (Fig. 2). 

We have shown that a Population Y that had ancestry from a 
lineage more closely related to present-day Australasians than to pre- 
sent-day East Asians and Siberians, likely contributed to the DNA of 
Native Americans from Amazonia and the Central Brazilian Plateau. 
This discovery is striking in light of interpretations of the morphology 
of some early Native American skeletons, which some authors have 
suggested have affinities to Australasian groups. The largest number 
of skeletons that have been described as having this craniofacial mor- 
phology and that date to younger than 10,000 years old have been 
found in Brazil®, the home of the Surui, Karitiana and Xavante groups 
who show the strongest affinity to Australasians in genetic data. 
However, in the absence of DNA directly extracted from a skeleton 
with this morphology, our results are not sufficient to conclude that 
the Population Y we have reconstructed from the genetic data had this 
morphology. 

An open question is when and how Population Y ancestry reached 
South America. There are several archaeological sites in the Americas 
that are contemporary to or earlier than Clovis sites. The fact that the 
one individual from a Clovis context who has yielded ancient DNA 
had entirely First American ancestry’ suggests the possibility that 
Population Y ancestry may be found in non-Clovis sites. Regardless 
of the archaeological associations, our results suggest that the genetic 
ancestry of Native Americans from Central and South America cannot 
be due to a single pulse of migration south of the Late Pleistocene ice 
sheets from a homogenous source population, and instead must reflect 
at least two streams of migration or alternatively a long drawn out 
period of gene flow from a structured Beringian or Northeast Asian 
source. The arrival of Population Y ancestry in the Americas must in 
any scenario have been ancient: while Population Y shows a distant 
genetic affinity to Andamanese, Australian and New Guinean popula- 
tions, it is not particularly closely related to any of them, suggesting 
that the source of population Y in Eurasia no longer exists; further- 
more, we detect no long-range admixture linkage disequilibrium in 
Amazonians as would be expected if the Population Y migration had 
occurred within the last few thousand years. Further insight into 
the population movements responsible for these findings should be 
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(the remainder of which is related to Mesoamerican Mixe). b, Two- 
dimensional grid of combinations of the admixture proportions % and y which 
are compatible with the data in terms of how many predicted f,-statistics 
deviate by Z = 3.0 from empirical values. 


possible through genome-wide analysis of ancient remains from across 
the Americas. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized. The investigators were not blinded to 
allocation during experiments and outcome assessment. 

New Affymetrix Human Origins genotypes. We generated new Affymetrix 
Human Origins array genotypes for 48 individuals from 9 populations from 
present-day Brazil (Apalai, Arara, Guarani_GN, Guarani_KW, Karitiana, Surui, 
Urubu Kaapor, Xavante and Zoro). Ethical approval for the sample collection 
was provided by the Brazilian National Ethics Commission (CONEP Resolution 
no. 123/98). CONEP also approved the oral consent procedure and the use of 
these samples in studies of population history and human evolution. Individual 
and/or tribal informed oral consents were obtained from participants who were 
not able to read or write. All sampling was coordinated by co-authors of this 
study (M.L.P.-E. and F.M.S.) and their collaborators, in a manner consistent 
with the Helsinki Declaration and Brazilian laws and regulations applicable at 
the time of sampling. Logistical support for the sample collection was provided 
by the Fundagio Nacional do Indio (FUNAI). We curated the data in the same 
way as reported in ref. 19 (Supplementary Information section 1). We computa- 
tionally phased these data together with the previously published Affymetrix 
Human Origins SNP array data using SHAPEIT2 (ref. 31) with default 
parameters. 

High-coverage genome sequencing and processing. We sent samples from 18 
Papuan, Mixe, Surui and Yoruba individuals to Illumina for deep-coverage 
sequencing using a non-PCR-based protocol as part of the Simons Genome 
Diversity Project. The sequence reads were mapped using the ‘aln’ algorithm of 
BWA (version 0.5.10)” and genotypes were inferred using the unified genotyper 
from GATK* (version 2.5.2-gf57256b) These data are available from (https://www. 
simonsfoundation.org/life-sciences/simons-genome-diversity-project-dataset/). 
Briefly, sequence reads were stripped of adapters before alignment to the decoy 
version of the hg19 reference sequence (hs37d5). Read groups were added for 
identification and compatibility with GATK tools, before indel realignment and 
duplicate removal. The genotyping performed thereafter used a reference-free 
procedure that reduces reference bias. A specially developed filtering engine 
assigned filtering levels from 0 to 9 for each position in the genome. All population 
genetic analyses in this paper used the most stringent level of filtering (level 9). 
Testing for more than one ancestral population of Central and South 
Americans. To investigate whether Central and South American populations 
are consistent with being derived from a single stream of ancestry, we applied 
the software qpWave to ask the question whether the set of f,-statistics 
of the form filA = American, , B= Americanz; X = outgroup, ,Y = outgroup) ) -_ 
(pa —ps) (px —py) forms a matrix that is consistent with being of rank 0 (averaged 
over all SNPs, where pa, Pp, Px, and py are the frequencies of an arbitrarily chosen 
allele in populations A, B, X and Y at each locus). If all these Native American 
populations descend from the same stream of migration into the Americas, then 
the f,-statistic relating each Native American population to each non-Native 
American population should be the same for all Native American populations, 
and in particular consistent with 0. Formally, to evaluate whether the f,-statistic 
matrix is consistent with being of rank 0, we compute a Hotelling’s T° test that 
appropriately corrects for the correlation structure of the f,-statistics. We analysed 
7 Native American populations each with at least 3 individuals with no detected 
post-Columbian admixture, and 4 populations from each of 6 worldwide regions 
as outgroups (Supplementary Information section 2). 

D-statistic tests based on correlation in allele frequencies. To investigate 
whether a tree-like population history ((A, B),(X, Y)) is consistent with the data, 
for example, with A = chimpanzee, B = Onge, X = Mixe and Y = Suruli, we com- 
puted D-statistics'**' 


(P. =P) (Px —Py) 
(P. +P, —2p,P;)(Px +Ppy —2p.P,) 


over all SNPs, where p,, pp, px, and py are the frequencies of an arbitrarily chosen 
allele in populations A, B, X and Y at each locus. We computed standard errors 
using a block jackknife weighted by the number of SNPs in each 5 cM (5 Mb in the 
case of high-coverage genome sequences) block in the genome*”. We report 
Z-scores as normalized Z = D/s.e. and we interpret statistics |Z| > 3 as being 
significantly different from 0. We only considered SNPs that were informative, 
in the sense that they are polymorphic both within (A,B) and (X,Y). 

Correlation of signal to regions of functional importance. We divided the 
genome into 10 deciles of the ‘B-value’ described in ref. 24, which integrates 
multiple genomic annotations into a single estimate of proximity to functional 
regions for each nucleotide in the genome. We then used linear regression to 
estimate the coefficient a of the function y = ax + c where x = B (the rank of 
the decile of B) and y= Dg (D restricted to the particular decile of B). To 
compute standard errors, we used a weighted block jackknife procedure where 


D(A,B; X,Y) 
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each 5 Mb block of the genome is dropped in turn and a is recomputed. The 
variability of a across each of these leave-one-out computations, weighting by 
the number of informative loci in each block, was what we used to estimate a 
standard error**** 

hy-statistic tests based on correlation in linkage disequilibrium. We devised a 
linkage disequilibrium statistic that tests for symmetry in linkage disequilibrium 
between two proposed clades with a pair of populations in each. The statistic, hy, is: 

ha=((Pin—PiP2) — (Pin —PiP2)) x ((Pt2— PLP) — (Pr2 PrP?) 

where J and 2 are arbitrarily chosen reference alleles at two different loci, respect- 
ively, and A, B, C, and D denote four different populations. Thus, ph is the 
frequency of the 12 haplotype in population A, and p‘ is the frequency of the 
1 allele in population A. The quantity pj, — pip} thus measures the difference 
between the observed haplotype frequency and the expected haplotype frequency 
given the allele frequencies*®. The motivation for this statistic being informative 
about population history is that under a tree-like model ((A, B), (C, D)) with no 
gene flow, differences in linkage disequilibrium between populations A and B are 
not expected to correlate to differences in linkage disequilibrium between popula- 
tions C and D. If there has been gene flow between the two clades, the statistic may 
be significantly positive or negative like f,- and D-statistics'®. 

In practice, we computed this statistic for each polymorphic locus (‘target 

locus’) by identifying all other polymorphic loci 5’ of the target locus at distance 
interval d + w and computing the statistic for each pairing. We then averaged the 
statistic over all valid pairs of loci in the genome identified in this way. We 
computed standard errors using a block jackknife over contiguous 5 cM blocks 
in the genome, where SNP pairs that bridge the boundary of two blocks are 
assigned to the block in which the target locus is found. For the main analysis 
we computed h,-statistics of the form h,(Yoruba, X; Mixe, Surui) for all popula- 
tions X genotyped using the Affymetrix Human Origins SNP array, and all pairs of 
SNPs within 0.01 cM of each other. We restricted the analysis to populations with 
at least 10 individuals. We also computed the h,-statistic for windows of 0.001 cM 
centred around different genetic distances for selected populations (Extended 
Data Fig. 5). 
Chromosome painting symmetry tests. We used SHAPEIT to phase 593,142 
SNPs with the same set of individuals as described above, using all autosomal SNPs 
in the Affymetrix Human Origins array. We then ‘painted’ unadmixed Native 
American individuals using non-American populations, and excluded the Yukagir 
and the Chukchi since they have evidence of back-migration from the Americas. 
We ran ChromoPainter v2 using default parameters, painting each recipient indi- 
vidual separately, but using all donor populations as candidates to paint each 
recipient haplotype. To assess statistical uncertainty, we repeated this procedure 
for each recipient individual using 22 subsets of the data where for each of these 
subsets a different autosome had been dropped. We then used the results of these 
22 block jackknife pseudo-replicates to obtain a weighted block jackknife estimate 
of the standard error for our test statistic (see below). 

To test if the recipient populations copied equally from the donor populations, 
we computed the average ‘chunk count’ Cg.p copied from a given donor popu- 
lation D in each recipient population R (averaged over individuals). We then 
computed a S(R;, Rz; D) statistic that quantifies the symmetry between two 
Native American populations in their copying from each donor: 


Cr, :p — Cr, :p 
Crp + Cr,:p 


S(D; R2,Ri) = 


If two Native American populations, such as the Surui and the Mixe, derive all of 
their ancestry from a single common origin, we expect that they would copy from 
the donor populations at an equal rate. We computed the standard error of 
this statistic using the 22 subsets of the data where each autosome had been 
dropped, weighted using the number of SNPs on each chromosome. We generated 
the world map in Fig. 1d by using the R maps package to plot the value of 
S(X; Mixe, Surui+ Karitiana) for each non-American population X, and 
S(Onge; Mixe, Y) for each American population Y. 

Admixture graph models of population relationships. We used ADMIXTURE- 
GRAPH" to fit suggested phylogenies with admixture events to the data. We 
assessed goodness-of-fit by investigating all possible f-statistics predicted by the 
fitted model and assessing whether they differed significantly from the empirical 
data. We chose as a starting point the model relating Mbuti Africans, Andamanese 
Onge, MAI and Karitiana fitted by a previous study’’ where lineages related to 
MAI and the Onge both contributed ancestry to the Karitiana. We added to this 
Han Chinese to represent a population that is phylogenetically more closely 
related to one of the ancestral populations of Native Americans than are the 
Onge (Extended Data Figs 6 and 7). We find that this model is inconsistent with 
the data, as the model predicts that Mixe and Surui/Karitiana are equally related to 
Onge, and indeed we observe several statistics for which the Z-score for the 
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difference between the predicted and empirical statistics is |Z| > 3 (Extended Data 
Table 3). To account for this, we fitted a model in which the ancestors of 
Amazonians received admixture from a population related to the Onge 
(Extended Data Fig. 6), and found that this provides an excellent fit to the data, 
with no |Z|-score differences greater than 3. In contrast, alternative models of 
Han-related or MA1-related gene flow into the Americas are inconsistent with 
the data (Extended Data Fig. 6 and Extended Data Table 3). 

Code availability. A python program for computing hy symmetry statistics and 
other population genetic statistics used in this paper is available at (https://github. 
com/pontussk/popstats). 
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Extended Data Figure 1 | Clustering analysis. ADMIXTURE" clustering analysis performed on the Affymetrix Human Origins data used in this study. To aid in 
visualization, we only show results for Native American samples and for selected samples from Eurasian populations. 
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Extended Data Figure 2 | qpWave coefficients. Weights from qpWave for Native American populations and for non-American outgroup populations. No 
weights are given for Yoruba and Cabecar, as they are used in the computation. 
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Extended Data Figure 3 | Excess allele sharing between the Surui and the 
Onge. a, Tests for excess shared derived alleles with the Onge in all possible 
comparisons of 8 Surui and 10 Mixe individuals. All Mixe-Surui comparisons 
show a positive skew whereas all Mixe-Mixe and Surui-Surui comparisons 
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are consistent with 0. Lines correspond to one standard error in either direction. 
b, Random sequence or genotype errors cannot explain the affinity of the 
Amazonians to Australasians, as simulated increased errors in the Onge do 
not cause an increased affinity to Surul. 
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Extended Data Figure 4 | Signals of admixture as a function of proximity to —_ of quartets D(Yoruba, X; Y, Z) shows that quartets with significantly positive 
functional regions. a, The affinity of 16 Papuan high-coverage genomes to slopes (|Z| > 3) also yield significant genome-wide D-statistics of the 

2 Amazonian Surui high-coverage genomes as a function of proximity to opposite sign. This suggests that signals of admixture are systematically 
regions of functional importance (measured by B-value). b, A total of 395 tests _ stronger close to functionally important regions. 
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populations with at least 6 samples. c, d, We computed D(Yoruba, X; Y, Z) and _hy-statistic with the genetic distance separation of pairs of SNPs for h4(Yoruba, 
h4(Yoruba, X; Y, Z) for many combinations of populations as X, YandZusing X; Mixe, Surui). 
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Extended Data Figure 6 | Admixture graphs for fitted population history graph where the distinct ancestry in Amazonians is more closely related to Han 
models. a, An admixture graph where all Mixe, Surui and Karitiana are of than to Onge produces 6 outliers. d, An admixture graph with no distinctive 
100% First American ancestry is rejected with 6 predicted f-statistics at least 3. ancestry in Karitiana or Surui but East Asian gene flow into the Mixe produces 7 
standard errors from the empirically observed value. b, An admixture graph _ outliers. e, An admixture graph with no distinctive ancestry in Karitiana or 
where the ancestors of Surui and Karitiana receive 2% ancestry from a lineage  Surui but MA1-related gene flow into the Mixe produces 6 outliers. 

related to the Onge is consistent with the data with no outliers. c, An admixture 
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Extended Data Table 1 | qpWave analysis provides evidence that Central and South American genetic variation is inconsistent with being 
derived from a single homogeneous population 


P-value for this number of streams 
1 2 3 4 
Full data 2.03E-07 0.09 0.58 
Outgroup region dropped 
Africa 1.67E-04 0.34 0.92 
C. Asia/Siberia 5.91E-07 0.11 0.6 
East Asia 4.46E-09 0.04 0.57 
South Asia 6.95E-05 0.1 0.4 
West Eurasia 1.41E-05 0.06 0.37 
Oceania 4.39E-05 0.43 0.88 
Native American population dropped 
Cabecar 1.13E-08 
Guarani 9.50E-07 
Karitiana 1.41E-06 
Mixe 8.32E-03 
Piapoco 1.30E-04 
Pima 2.19E-05 
Surui 2.35E-06 


Africa + I other region 
Siberia 

East Asia 

South Asia 

West Eurasia 

Oceania 


Siberia + 1 other region 
Africa 
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Extended Data Table 2 | Top 20 D-statistics observed for D(chimpanzee, Old World population; Central Americans, Amazonians) 


Rank 


OnNDN BWN 


Population 
Onge 
Papuan 


New_Guinea 
Australian WGA 


Mamanwa 
Bougainville 
Kharia 
Tongan 
Bengali 
Mala 

Ami 
Lodhi 
Sindhi 
Kusunda 
Lahu 
Kinh 
Australian 
Balochi 
Thai 
Semende 


D 

0.0101 
0.0084 
0.0082 
0.0074 
0.0068 
0.0065 
0.0059 
0.0058 
0.0058 
0.0055 
0.0052 
0.0052 
0.0051 
0.0050 
0.0050 
0.0049 
0.0048 
0.0047 
0.0047 
0.0045 


SE 

0.0022 
0.0022 
0.0023 
0.0024 
0.0020 
0.0023 
0.0020 
0.0022 
0.0019 
0.0019 
0.0020 
0.0019 
0.0019 
0.0020 
0.0021 
0.0020 
0.0025 
0.0019 
0.0020 
0.0020 


Z 

4.60 
3.82 
3.54 
3.12 
3.40 
2.85 
207 
2.68 
3.00 
2.93 
2.61 
Zle 
2.12 
2.56 
2.31 
2.46 
1.96 
255 
2.38 
2.2) 


Region 1 

India 

Papua New Guinea 
Papua New Guinea 
Australia (Arnhem Land) 
Philippines (Negrito) 
Papua New Guinea 
India 

Tonga 

Bangladesh 

India 

Taiwan 

India 

Pakistan 

Nepal 

China 

Vietnam 

Australia 

Pakistan 

Thailand 

Indonesia (Sumatra) 


Region 2 
South Asia 
Oceania 
Oceania 
Oceania 
Oceania 
Oceania 
South Asia 
Oceania 
South Asia 
South Asia 
East Asia 
South Asia 
South Asia 
South Asia 
East Asia 
East Asia 
Oceania 
South Asia 
East Asia 
Oceania 
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Extended Data Table 3 


f,-statistics for which the statistic predicted by the fitted admixture graphs deviates by more than |Z| > 3 from the 
statistic computed on the empirical data 


A B »¢ Y Predicted f, Empirical f; Z-score 
Single First American origin (Extended Data Figure 6A) 
Mbuti Onge Mixe  Surui 0 0.003506 3.530 
Mbuti Onge Mixe _ Karitiana 0 0.00315 3.431 
Onge Mixe Mixe _ Surui -0.018466 -0.021724 -3.061 
Onge Mixe Mixe _ Karitiana -0.018466 -0.021849 -3.226 
Onge Han Mixe_ Surui 0 -0.002902 -3.654 
Onge Han Mixe_ Karitiana 0 -0.00239 -3.279 
Onge-related ancestry in the Amazon (Extended Data Figure 6B) 

(No outliers) 
East Asian admixture in South America (Extended Data Figure 6C) 
Mbuti Onge Mixe  Surui 0 0.003506 D035 
Mbuti Onge Mixe_ Karitiana 0 0.00315 3.431 
Onge Mixe Mixe_ Surui -0.018466 -0.021724 -3.061 
Onge Mixe Mixe_ Karitiana -0.018466 -0.021849 -3.226 
Onge Han Mixe_ Surui 0 -0.002902 -3.654 
Onge Han Mixe_ Karitiana 0 -0.00239 -3.279 
East Asian admixture in Central America (Extended Data Figure 6D) 
Mbuti Onge Mixe Surui -0.000002 0.003506 ee i 
Mbuti Onge Mixe _ Karitiana -0.000002 0.00315 3.433 
Onge Mixe Mixe _ Surui -0.018466 -0.021724 -3.061 
Onge Mixe Mixe_ Karitiana -0.018466 -0.021849 -3.225 
Onge Han Mixe_ Surui -0.000004 -0.002902 -3.649 
Onge Han Mixe_ Karitiana -0.000004 -0.00239 -3.273 
Ancient Siberian (MAI) admixture in Central America (Extended Data Figure 6E) 
Mbuti Onge Mixe  Surui 0 0.003506 S50 
Mbuti Onge Mixe  Karitiana 0 0.00315 3.431 
Onge Mixe Mixe _ Surui -0.018470 -0.021724 -3.057 
Onge Mixe Mixe_ Karitiana -0.018470 -0.021849 -3.222 
Onge Han Mixe_ Surui 0 -0.002902 -3.654 
Onge Han Mixe_ Karitiana 0 -0.00239 -3.279 
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Mutations in DCHS] cause mitral valve prolapse 
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Mitral valve prolapse (MVP) is a common cardiac valve disease that 
affects nearly 1 in 40 individuals’ *. It can manifest as mitral regur- 
gitation and is the leading indication for mitral valve surgery*”. 
Despite a clear heritable component, the genetic aetiology leading 
to non-syndromic MVP has remained elusive. Four affected 
individuals from a large multigenerational family segregating 
non-syndromic MVP underwent capture sequencing of the linked 
interval on chromosome 11. We report a missense mutation in the 
DCHS1 gene, the human homologue of the Drosophila cell polarity 
gene dachsous (ds), that segregates with MVP in the family. 
Morpholino knockdown of the zebrafish homologue dachsous1b 
resulted in a cardiac atrioventricular canal defect that could be 
rescued by wild-type human DCHS1, but not by DCHS1 messenger 
RNA with the familial mutation. Further genetic studies identified 
two additional families in which a second deleterious DCHS1 
mutation segregates with MVP. Both DCHS1 mutations reduce 
protein stability as demonstrated in zebrafish, cultured cells and, 
notably, in mitral valve interstitial cells (MVICs) obtained during 
mitral valve repair surgery of a proband. DchsI1*’~— mice had 
prolapse of thickened mitral leaflets, which could be traced back 
to developmental errors in valve morphogenesis. DCHS1 defi- 
ciency in MVP patient MVICs, as well as in Dchs1*/~ mouse 
MVICs, result in altered migration and cellular patterning, sup- 
porting these processes as aetiological underpinnings for the 
disease. Understanding the role of DCHS1 in mitral valve develop- 
ment and MVP pathogenesis holds potential for therapeutic 
insights for this very common disease. 

Ina previous study, based on specific diagnostic criteria® °°, MMVP2 
(myxomatous mitral valve prolapse-2) was mapped to a 4.3 cM region 
of chromosome 11p15.4 in a family of Western European descent 


segregating non-syndromic mitral valve prolapse as an autosomal 
dominant trait with age-dependent penetrance (Fig. 1a, c)°. We per- 
formed tiled capture and high-throughput sequence analysis of geno- 
mic DNA from four affected individuals (Fig. 1a), identifying 4,891 
single nucleotide variants (SNVs) and insertion/deletion polymorph- 
isms in the targeted region (see Methods). After selecting rare protein- 
coding variants shared among all affected pedigree members, we 
identified three heterozygous protein-altering variants: two missense 
SNVs in DCHS1, a member of the cadherin superfamily”, resulting in 
p.P197L and p.R2513H (Fig. 1b), anda single missense SNV in APBB1, 
the amyloid beta (A4) precursor protein-binding family B, member 1 
gene resulting in p.R481H. Both DCHS1 mutations, p.P197L and 
p-R2513H, were rare in the population (the former observed three 
times in 4,300 European-American individuals from the NHLBI 
Exome Sequencing Project and the latter never observed), and both 
were predicted to be protein damaging by PolyPhen-2 (ref. 11), LRT”, 
and MutationTaster’*. While the APBB1 variant was also rare in popu- 
lation-based data, no cardiac phenotype was observed in apbb1 mor- 
phant zebrafish, despite reduction of apbb1 mRNA (Extended Data 
Fig. 1a, b). Additionally, Apbb1 is not expressed in murine cardiac 
valves (Extended Data Fig. 2)"*, and no cardiac defects have been 
reported in the Apbb1 knockout mouse’’. This suggests that the 
APBB1 variant is unlikely to be contributing to MVP in this family. 
The functional effect of the DCHS1 variants was evaluated in the 
zebrafish Danio rerio, as this model system lends itself to functional 
annotation of mutations implicated in human disease’*"*. Zebrafish 
have two DCHS1 homologues, dachsous1a and dachsous 1b. dachsous1b 
is located in a region of D. rerio chromosome 10 that is syntenic to the 
DCHS1 region of human chromosome 11. Knockdown of dachsous1a 
did not result in a cardiac phenotype despite reduction in mRNA levels 
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Figure 1 | Pedigrees, mutation, and phenotype. Black symbols, MVP 
affected, grey, unknown, arrows, probands. If no genotype is shown, the 
individuals were unavailable for study. A, Pedigree linked to chromosome 11. 
#, Individuals under 15 years of age; *, individuals sequenced. Genotypes 
c.7538G>A (R2513H) of DCHS1 mutation are shown. b, DNA sequence of 


(Extended Data Fig. la, b); however, knockdown of dachsous1b 
(dchs1b) led to significant changes in cardiac morphology (Fig. 2a; 
Extended Data Fig. 1a). Control zebrafish hearts undergo looping 
and develop an atrioventricular constriction by 48h post-fertilization 
(hpf), whereas dchs1b knockdown disrupts this process, resulting in 
impaired formation of the atrioventricular constriction (Fig. 2a, b). 
While control embryos have unidirectional blood flow between the 
atrium and ventricle at 72 hpf (Supplementary Video 4), dchs 1b knock- 
down causes regurgitation of blood from the ventricle into the atrium 
(Supplementary Video 5). An atrioventricular canal defect was defined 
as failure of cardiac looping combined with any atrioventricular regur- 
gitation at 72 hpf. Using a high morpholino dose (1.5 ng) to establish 
the phenotype, the prevalence of atrioventricular canal defects was 76% 
(n = 170), whereas spontaneous cardiac defects were rarely observed in 
controls (0.5%, n = 205) (Fig. 2b). Whole-mount in situ hybridization 
of dchs1b confirmed predominant expression at the atrioventricular 
junction at 54 and 72hpf, corresponding to the temporal defects 
observed in the morphants (Extended Data Fig. 3a—c). We evaluated 
gene expression patterns in the developing atrioventricular ring, and 
observed that bmp4 expression is expanded into the ventricle at 48 hpf 
in dchs1b knockdown embryos while it is restricted to the atrioventri- 
cular ring in controls (Extended Data Fig. 4a, b). Additionally, has2 
expression was not detectable at 48 hpf, and only faintly at 72 hpf in the 
dchs1b knockdown (Extended Data Fig. 4i-l). To test mutation patho- 
genicity in this model, rescue experiments were performed using both 
wild-type human DCHS1 and P197L/R2513H mutant mRNA, which 
were injected into dchs1b knockdown zebrafish with a lower dose of 
morpholino (0.75 ng) to minimize combined morpholino/mRNA tox- 
icity. Human wild-type DCHS1 mRNA rescued the atrioventricular 
canal defect observed upon dchs1b knockdown, whereas injection of 
an equimolar amount of mutant DCHS1 mRNA failed to rescue 
(Fig. 2c). Injection of the mutant DCHS1 mRNA alone did not cause 
atrioventricular canal defects, supporting a loss-of-function mechanism 
for the DCHS1 mutation. 

Having demonstrated segregation of a loss-of-function DCHS1 
mutation with MVP in our large pedigree, we sought to determine if 
genetic variation in DCHS1 plays a role in MVP beyond the linked 
family. By evaluating a cohort of MVP patients, we identified two 
additional families in which MVP segregated with the novel DCHS1 
protein variant p.R2330C (Fig. 1d-g). The proband of family 2 
underwent surgical mitral valve repair for severe MVP and mitral 
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c.7538G>A (p.R2513H). c, Two-dimensional echocardiographic long-axis 
view of family 1 proband. Dashed line marks mitral annulus. d, e, Family 2 and 3 
pedigrees. Genotype c.6988C>T (p.R2330C) shown. f, DNA sequence 
c.6988C>T (p.R2330C). g, Two-dimensional echocardiographic long-axis view 
of family 2 proband. 
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Figure 2 | Zebrafish Dchs1b is required for atrioventricular canal 
development. a, By 72 hpf, zebrafish hearts develop a constriction in the 
atrioventricular canal (AVC) that separates the atrium (a) from the ventricle 
(v). b, Knockdown of dchs1b results in absence of the atrioventricular 
constriction (bracket). c, Approximately 75% of Dchs1b morphants exhibit 
AVC defects (*P = 1 X 10 ©’). DCHS1 human mRNA rescues the dchs1b 
morpholino AVC phenotype, whereas human mutant DCHS1 mRNA (P197H/ 
R2513H) fails to rescue the phenotype (**P = 0.009). The total number of fish 
analysed was 611 and statistical values were obtained using Fisher’s exact test. 
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Hours after CHX treatment 
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regurgitation at age 21. Valve tissue was resected to repair the posterior 
leaflet and examination of the tissue showed classic myxomatous 
degeneration’*”° (Extended Data Fig. 5). Mitral valve interstitial cells 
were isolated from a portion of the posterior leaflet resected during 
surgery, providing a unique resource for the functional studies 
described below. The proband’s sister, evaluated at age 27 and also 
heterozygous for the p.R2330C mutation, demonstrated classical MVP 
with thickened leaflets and moderate regurgitation. The father and 
maternal grandmother were unaffected and do not carry the mutation. 
The mother (age 49) and maternal grandfather (age 76) are both 
affected with mild MVP and both carry p.R2330C. In family 3, the 
proband (Fig. le) had moderate to severe heart failure owing to severe 
mitral regurgitation with posterior leaflet prolapse requiring surgery at 
age 72. The proband’s sister, age 69, is unaffected and negative for the 
p-R2330C mutation. The son (age 52) and the daughter (age 53) are 
both affected with MVP and both carry p.R2330C. The second son 
(age 55) also carries p.R2330C, however his MVP status is indeterm- 
inate due to due to mild left ventricular inferior wall hypokinesis that 
tethers the leaflets down into the left ventricular cavity*', masking 
leaflet prolapse motion towards the left atrium. 

To evaluate the functional consequence of the DCHS1 mutations, 
we quantified protein levels in cells transfected with either wild-type 
or variant (p.P197L/p.R2513H) DCHS1 complementary DNA con- 
structs. Expression of the DCHS1 mutant protein was ~60% less than 


Dchs1*/- 


Dchs1*/* 


Echo 


MRI 


Figure 4 | Dchs1 deficiency causes MVP and myxomatous degeneration 
in the adult mouse. Echocardiography (Echo), MRI, histopathology and 3D 
reconstructions performed on 9-month old male Dehs1*’* and Dchs1‘’~ 
mouse hearts. Echo, posterior leaflet prolapse in Dchs1‘’~ (green arrow) (n = 6 
per genotype). Immunohistochemistry (IHC), Dchs1 */~ (4 =5) anterior, 
posterior leaflets (AL, PL) exhibit myxomatous degeneration and expansion 
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Figure 3 | DCHS1 mutations result in 
diminished protein levels. a—c, Western blot, 
(p.P197L/p.R2513H) mutant DCHS1 results in a 
60% decrease in protein with no change in RNA 
expression. d, Left panel, DCHS1 wild-type (WT) 
or mutant (p.P197L/p.R2513H) transfectants 
treated with cycloheximide (CHX) for specified 
times followed by western blot analyses. Right 
panel, cycloheximide on control (DCHS1 WT) or 
MVP patient (p.R2330C) MVICs. Tubulin, loading 
control. e, Calculated protein half-lives. WT and 
mutant transfectants half-life = 5.8h versus 1.6, 
respectively (left). Control and mutant DCHS1 
half-life is 1.73 h versus 0.46 h, respectively (right). 
Analyses performed in triplicate and repeated four 
times. Error bars, standard deviations; P values 
calculated using two-tailed Student’s t-test. 


Hours after CHX treatment 


wild-type with no significant change in mRNA levels, suggesting 
that the DCHS1 variants reduce protein stability (Fig. 3a-c). Cyclo- 
heximide treatment revealed that wild-type DCHS1 protein in trans- 
fected cells had a half-life of 5.8h, while the mutant DCHS1 protein 
(p.P197L/p.R2513H) had a half-life of 1.6h (Fig. 3d, e). DCHS1 con- 
structs harbouring either p.P197L or p.R2513H were evaluated, 
showing that p.R2513H markedly reduced protein levels, implicating 
this variant as pathogenic in the family (Fig. 1, Extended Data Fig. 6). 
A similar analysis of DCHS1 protein half-life was conducted using 
p-R2330C MVICs from the proband of family 2. Consistent with the 
data obtained from the p.R2513H transfectants, these studies showed 
significant reduction in protein half-life compared to control MVICs 
(t1/2 = 0.46 h versus 1.73 h, Fig. 3d, e). Together, our studies show that 
p-R2513H and p.R2330C result in DCHS1 loss of function. In order to 
evaluate DCHS1 loss of function in a mammalian model, we analysed 
Dchs1-deficient mice for phenotypic similarities to human MVP. 
Homozygous knockout of Dchs1 in mice results in neonatal lethality 
and multi-organ impairment”; however, the relevant genetic model 
for human MVP is the heterozygous Dchs1 mouse. Dchs1*/~ mice 
exhibit mitral valve prolapse with pronounced involvement of the 
posterior leaflet, which is elongated and shifts the leaflet coaptation 
anteriorly (Fig. 4) as in the proband from family 1 (Fig. 1c, Supple- 
mentary Videos 1-3, 6, 7)°3. Micro-MRI analyses and 3D reconstruc- 
tions of adult Dchs1*’~ mice reveal prominent posterior leaflet 


Dchs1*/* 


Dchs1*- 
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Dchs1*/* Dchs1*/- 


ld\N-de 


= 


0.26 + 0.03 mm? 


0.16 + 0.04 mm? 
of proteoglycan expression compared to Dchs1*/* (n = 7), collagen (red), 
proteoglycans (green). MRI show posterior leaflet (PL) thickening in Dchs1*/~ 
(arrow-inset) compared to control littermates. 3D reconstructions of MRI: 
Dchs1*’~ mice exhibit thickened and elongated leaflets compared to Dchs1*’~. 
(Two-tailed Student’s t-test was used to calculate P values; P= 0.01, n =4 


per genotype). 
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thickening with quantitative increases in valve volume (Fig. 4 and 
Supplementary Video 8). All other echocardiographic measurements 
were unchanged (Extended Data Fig. 7). Histological and molecular 
characterization of Dchs1*’~ mice confirmed leaflet thickening and 
showed myxomatous degeneration with increased proteoglycan accu- 
mulation in both mitral leaflets (Fig. 4). These results clearly show that 
Dchs1 heterozygosity results in mitral valve prolapse in mice. 

To determine if MVP has a developmental origin, we performed 
expression and functional analyses at embryonic and fetal time 
points. RNA in situ hybridization and immunohistochemistry showed 
expression of Dchs1 in endocardial and mesenchymal cells of atrioven- 
tricular valve leaflets at all time points examined (Extended Data 
Fig. 8). While no morphological defects were observed in the 
Dchs1*’~ mice during early embryonic development (E11.5-E13.5), 
at later time points (E15.5-E17.5) Dchs1*’~ mice displayed changes in 
mitral-valve shape (Fig. 5a—c), which were more severe in Dchs1~/— 
animals. Histology and three-dimensional reconstructions of anterior 
and posterior mitral leaflets at E17.5 of Dchs1 +/+) Dchsi*’~ and 
Dchs1~‘~ mice showed comparable leaflet volumes. However, 
Dchs1*/~ and Dchs1~’~ animals exhibited statistically significant 
changes in valve length and width (Supplementary Video 9 and 
Fig. 5a-c). In most leaflet regions measured, Dchs1*’~ animals dis- 
played an intermediate phenotype, demonstrating a gene dosage effect. 
These shape changes implicate Dchs1 as critical for proper anatomical 
patterning of the valve, consistent with previous reports of dachsous 
function in the Drosophila wing”*. Thus, in vivo lineage-tracing studies 
were performed on Dchs1*/* and Dchs1*’~ mice. Crossing the 
WT1-Cre/ROSA-eGFP* line onto both Dchs1*’* and Dchs1*/~ back- 
grounds allowed visualization of patterning defects of epicardial- 
derived cells (EPDCs) during migration into the posterior leaflet. 
This EPDC population initially migrates into the posterior leaflet as 
a sheet of cells. However, in the Dchs1*/~ mice this sheet-like appear- 
ance is disrupted and an increase in EPDCs infiltrating diffusely 
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Figure 5 | Developmental aetiology for MVP. a, b, Haematoxylin and eosin 
and 3D reconstructions of E17.5 Dchs1*’*, Dchs1*’~ and Dchs1~/~ mouse 
hearts showing thickening of anterior and posterior leaflets (AL, PL) in 
Dchs1~’~ mice compared to Dchs1*’*. Dchs1*’~ valves display an intermed- 
iate phenotype. c, Quantification of valve dimensions showing Dchs1~‘~ (green 
bars) and Dchs1*’~ (red bars) anterior and posterior lengths were significantly 
reduced compared to Dchs1*’* (blue bars) leaflets. Dchs1~/~ and Dchs1*/— 
valves displayed increased thickness throughout the leaflets compared to 
Dchs1*/*. Scale bars, 100 pm. n = 5 per genotype and two-tailed Student’s 
t-test was used to calculate P values; *P < 0.01. 
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throughout the valve tissue is observed (Extended Data Fig. 9a-c, 
Supplementary Video 10), concomitant with altered cellular pattern- 
ing and alignment (Extended Data Fig. 9d, e). In vitro studies of MVICs 
from Dchs1‘’~ and from the family 2 MVP proband (p.R2330C) 
also show this increased migration phenotype (Extended Data 
Fig. 10a, b). Taken together, the mouse and human studies support a 
developmental aetiology for MVP, and invoke a model for MVP in 
which cell migration and patterning defects mediated by DCHS1 con- 
tribute to disease pathogenesis. 

MVP is one of the most common cardiovascular diseases, affecting 
nearly 1 in 40 people worldwide’’. Although its heritability and vari- 
able expression in large pedigrees has been known for decades, its 
genetic underpinnings have remained elusive*”°. We report the dis- 
covery of two loss-of-function mutations in DCHS1 that segregate with 
MVP in three families. Our mouse models exhibit classical MVP, 
a phenotype that was traced back to developmental errors during valve 
morphogenesis. These findings provide a model for understanding 
inherited non-syndromic MVP as a developmentally based disease 
that progresses over the lifespan of affected individuals, consistent with 
previous reports on the natural history of MVP”. A robust estimate of 
the total contribution of rare DCHS1 genetic variation to sporadic 
MVP has yet to be determined and will require sequencing of large 
cohorts of MVP patients. Nonetheless, discovery of this novel mech- 
anistic pathway elucidated by intensively studying rare familial muta- 
tions will facilitate the identification of additional MVP genes and 
reveal pathogenic mechanisms that hold the potential for pre-surgical 
therapy for this very common cardiac disease. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Study participants. Family 1 was originally recruited through the Echocardio- 
graphy Laboratory at Massachusetts General Hospital as part of a phenotype- 
driven genetic study of MVP. MVP was diagnosed by specific criteria (>2 mm 
atrial leaflet displacement in a parasternal long-axis view)°°. The study was 
approved by the Institutional Review Board of Partners Healthcare, Boston, 
Massachusetts, and all participants provided written informed consent. 
Complete details of the linkage analysis on this large, multigenerational family 
have been previously published*. In brief, the family contains 41 individuals in five 
generations. Echocardiograms and DNA were obtained on 28 subjects, of whom 
12 were diagnosed with MVP, three were classified as having nondiagnostic min- 
imal leaflet displacement, and 13 were unaffected®. Three patients had non-dia- 
gnostic valve leaflet displacement and were considered unknown for the original 
linkage analysis. The proband had prominent MVP with thickened leaflets, severe 
mitral regurgitation, and heart failure ultimately requiring surgical valve repair 
(Fig. 1c and Supplementary Videos 1-3). Other affected members also showed 
diffuse leaflet thickening, prolapse, and mitral regurgitation of varying severity; 
one required surgical repair. No extracardiac manifestations of connective-tissue 
abnormalities or Marfan syndrome were present in any family members. 
Following a complete genome scan, parametric and non-parametric analyses 
confirmed linkage of this family to a 4.3cM region of chromosome 11p15.4. 
Consistent with the model of sex- and age-dependent penetrance, several of the 
unaffected members who carried the MVP allele were less than 15 years old at the 
time of evaluation (Fig. 1a). Importantly, an analysis using only affected indivi- 
duals confirmed the linkage result. 

DNA sequencing and variant calling in family 1. In order to identify the muta- 
tion, four affected individuals who shared the disease haplotype were chosen for 
sequencing (Fig. 1a). To reduce the likelihood of random haplotype sharing, we 
selected individuals with four distinct haplotypes on the non-MVP allele. A 2.1 Mb 
region of human chromosome 11 (5094774-7248926; NCBI36 coordinates) was 
targeted and screened for repetitive regions using the SureSelect system (Agilent). 
DNA extraction was performed using the AutoGenFlex STAR automated system 
(Autogen) and FlexiGen DNA purification reagents (Qiagen) according to the 
manufacturers’ instructions. Bait oligonucleotides were designed to the non-repet- 
itive regions of the targeted linkage peak, resulting in 1.03 Mb of target sequence 
using the SureSelect in-solution long RNA baits (Agilent). Captured DNA was 
amplified and quantified using the Agilent High Sensitivity DNA Kit for the 
Agilent 2100 Bioanalyzer, and sequenced using Illumina sequencing chemistry 
(paired-end, 100 cycles) at the Venter Institute supported by the NHLBI 
Resequencing and Genotyping Program. One hundred bases were sequenced 
from each end of the captured DNA fragments. Image analysis and base calling 
were performed using Illumina’s GA Pipeline version 1.6.0. Sequence reads were 
mapped to the human genome (ncbi36) and variants identified using clc-ngs-cell- 
2.0.5-linux_64 (clc_ref_assemble_long -q -p fb ss 180 360 -I -r, and find_ 
variations -c 8 -v -f 0.2). Variants were classified using VariantClassifier**. 4,891 
SNVs were identified in the four subjects, 1,951 were shared by all four subjects. 
We classified all rare SNVs or those of unknown frequency based on conservation 
data, population genetic data, and predictive functional impact from public 
resources. We performed analyses of conservation of the variant locus using 
PhyloP”, PhastCONS* and GERP”, and assessed the population frequency ini- 
tially using the Exome Variant Server, NHLBI GO Exome Sequencing Project 
(ESP), Seattle, Washington (http://evs.gs.washington.edu/EVS/) as an initial filter 
and the 1000 Genomes Project” as a secondary filter, cognizant of the limitations 
of the low-depth coverage of the 1000 Genomes Project to characterize rare 
mutations, which were available during our initial analyses. 

Sporadic MVP patient cohort. As part of an international consortium on mitral 
valve disease, we initiated collection of MVP patients with the eventual goal of 
performing GWAS studies in MVP. To date, 1,896 patients have been collected in 
the United States, France, and Spain. MVP was defined as systolic displacement of 
one or both mitral leaflets = 2mm beyond the annulus in parasternal or apical 
long-axis views, asymmetric posterior leaflet prolapse was also included in 
any view, including apical 4-chamber, when confirmed by side-to-side long-axis 
scanning”*. Patients were required to have no evidence by history, physical exam- 
ination, or imaging for Marfan syndrome or other connective tissue disorders 
associated with MVP. 

Exome sequencing, genotyping, and variant evaluation. As part of an MVP 
exome sequencing pilot project conducted by the Leducq Mitral Network, exome 
data were generated on twenty-one severe, early onset MVP patients and made 
available to identify variants in DCHS1. Fifteen patients were collected in Paris 
and had severe bileaflet mitral valve prolapse with myxomatous leaflets and an 
average age of onset of 15 years. Six patients were collected in Nantes, France, and 
had similar clinical characteristics with an average age at onset of 42. Exome 
capture was carried out using the SureSelect Human All Exon System using the 


manufacturer’s protocol version 1.0 that is compatible with Illumina paired-end 
sequencing. Exome-enriched genomes were multiplexed by flow cell for 101-bp 
paired-end read sequencing according to the protocol for the Hiseq 2000 sequen- 
cer (version 1.7.0; Illumina) to allow a minimum coverage of 30X. Reads were 
aligned to the human reference genome (UCSC NCBI36/hg19) using the Burrows- 
Wheeler Aligner (version 0.5.9). Evaluation of the DCHS1 gene yielded 4 novel 
coding sequence variants that confirmed following repeat Sanger sequencing: 
6646587 G/A (p.R2330C) 6646709 G/A (p.A2289V), 6648584 C/T (p.A1896T), 
6648820 A/G (p.V1817A), base pair positions are NCBI36 coordinates. These four 
variants, in addition to the variants identified in family 1, were genotyped using 
Sequenom technology in the sporadic cohort. The major steps included primer 
and multiplex assay design using Sequenom’s MassARRAY Designer software, 
DNA amplification by PCR, post-PCR nucleotide deactivation using shrimp alkal- 
ine phosphatase (SAP) to remove phosphate groups from unincorporated dNTPs, 
single-base extension reaction for allele differentiation, salt removal using ion- 
exchange resin, and mass correlated genotype calling using SpectroCHIP array 
and MALDI-TOF mass spectrometry. Quality control to determine sample and 
genotyping quality and to potentially remove poor SNPs and/or samples was 
performed in PLINK, a whole genome association analysis toolset. We predicted 
the impact on gene function using PolyPhen2", Mutation Taster’? and LRT”. 
Identification of family 2 and 3. In order to identify other mutations in DCHS1, 
we first evaluated DCHS1 in the exome sequence data described above, reasoning 
that early onset forms may be more likely to have strong genetic aetiologies. Rare 
variants causing amino acid substitutions in DCHS1 were identified in four indi- 
viduals (p.V1817A, p.A1896T, p.A2289V, and p.R2330C) and genotyped in a 
cohort of 1,864 sporadic MVP patients that included the 21 individuals with 
exome data; two of these variants, both localized to exon 19, were observed in 
the MVP cohort (p.A2289V in two cases and p.R2330C in three cases). The 
proband in family 2 carried the p.R2330C variant and underwent surgery for 
MVP in Paris. We were able to collect DNA and echocardiograms on first-degree 
relatives at that time. Additional clinical characteristics of the proband in family 
2 included congestive heart failure (NYHA II/III) with left ventricular dilatation 
(70/50 mm end-diastolic/end-systolic dimensions), impaired left ventricular sys- 
tolic function (ejection fraction 53%, low for this volume overload), recurrent 
symptomatic atrial fibrillation, non-sustained ventricular tachycardia, and exer- 
cise-induced pulmonary hypertension (70mm Hg systolic). The proband in 
family 3 was originally collected in Amiens, France. All echocardiograms were 
read in both Boston and Paris and readers were blind to genotype data. 

D. rerio studies. Husbandry, knockdown and expression analyses were per- 
formed in the wild-type D. rerio (zebrafish) strain Tubingen AB. Morpholinos 
were injected at a dose of 1.5 ng (after dose optimization) into single-cell embryos 
to achieve gene knockdown, and phenotypes were examined at 48 and 72h post- 
fertilization in three separate experiments of 50-75 embryos and compared to 
controls using Fisher’s exact test. Morpholino GeneTools LLC (Philomath, OR) 
sequences were as follows: apbb1 AACAAAGCGTACCACTCAGATTAGC, 
dchsla TAAAGAAATGACAGTCCTACCTCCA, and dchs1b CATAACTGTT 
AAGAGTTCCGCTACA. Knockdown was confirmed by quantitative polymer- 
ase chain reaction. qPCR was performed as previously described’. In brief, 20-30 
morpholino-injected embryos were collected at 72 hpf, and snap frozen in liquid 
nitrogen. TRIzol (Sigma) was added, RNA was purified according to the manu- 
facturer’s instructions, and cDNA was prepared using a Superscript III Kit 
(Invitrogen). Primer sets were as follows: apbb1, 5'-GTGGAGGCGAGAACA 
GAG, 5'-CCAGCAGGAAGATCCGTGTC; dchsla, 5'-GTTTCATGGAGGTT 
ACAGC, 5’-CTTAATCCACCCCCATCCAG; dchsIb, 5'-GTTTCCTTGAGG 
TAAAGGCGG, 5’-GGCCACCCCCATCGGACG. qPCR was performed using 
SYBR Green (Applied Biosystems) in triplicate on an Applied Biosystems 7500 
Fast Real-Time PCR instrument and normalized against f-actin. All zebrafish 
experiments were performed under protocols approved by the Institutional 
Animal Care and Use Committee at Massachusetts General Hospital. 

D. rerio in situ hybridizations. In situ hybridizations were performed as 
previously described™ using a partial clone of dchs1b (Open Biosystems, Clone 
ID 7136458) amplified with primers containing a T7 RNA polymerase site engi- 
neered onto the 3’ end of the reverse primer (Forward 5'-GGCAGTTCAA 
GTGGTGGT. Reverse: TAATACGACTCACTATAGGGTTAAATCCTCATCT 
CAGCCTCA, 17 site underlined.) The dchs1b probe was produced using a T7 
RNA polymerase (Ambion) and digoxigenin-labelled dNTPs (Roche). Other 
riboprobes used in the study have been previously described”. 

Generation of DCHS1 expression constructs. Human DCHS1 and the mutant 
containing the c.590C>T and c.7538G>A sequence changes were synthesized by 
Integrated DNA Technologies. A unique EcoRI site, a T7 polymerase site, and a 
Kozak sequence were added to the 5’ end of each gene, while a V5 tag and unique 
Xhol site were added to the 3’ end. Each gene was then subcloned into the 
expression vector pcDNA3.1. Additional expression constructs were generated 
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that contained only the c.590C>T (p.P197L) mutation or only the c.7538G>A 
(p.R2513H) mutation. These constructs were made using the QuikChange II XL 
Site-Directed Mutagenesis Kit (Agilent Technologies) as per the manufacturer’s 
instructions. The P197L construct was generated from the double mutant con- 
struct by changing the R2513H (c.7538G>A) mutation back into the wild type 
sequence using the following primers: 5'-gctgatggaagccgcagccatgccget, 3’-agcggc 
atggctgcggcttccatcage. (The underlined bold base indicates the base pair changed.) 
The R2513H construct was generated by introducing the (c.7538G>A) mutation 
into the wild type DCHS1 construct using the following primers: 5’-gctgatggaagcc 
acagccatgcecgct, 3 . agcggcatggctgtggcttccatcage. 

Preparation and injection of DCHS1 mRNA. mRNA was prepared from the 
wild-type DCHS1 and DCHS1 mutant expression vectors using a T7 mMessage 
mMachine Kit (Ambion) according to the manufacturer’s instructions. Injection 
mixtures containing 0.75 ng dchs1b MO alone, or 0.75 ng dchs1b MO plus 7 fg wl * 
of human DCHS1 mRNA (either wild type or mutant) were injected into one-cell 
embryos, and fish were scored for atrioventricular canal defects (failure to loop, 
and presence of regurgitation) 72h later. Data were collected from three inde- 
pendent experiments performed with 20-30 embryos each and comparisons made 
using Fisher’s Exact test. 

Isolation of DCHS1 p.R2330C MVP and control patient mitral valve tissue 
and valvular interstitial cells. Resected posterior mitral valve tissue was used for 
culture and histology. For culture, valve pieces were minced in phosphate buffered 
saline (PBS) and washed in DMEM with antibiotics (penicillin/streptomycin (P/S) 
and fungizone) and incubated in DMEM with collagenase type II (Worthington) 
(1 mgm *) at 37°C for 12h. Following mechanical dissociation in DMEM, the 
cell suspension was filtered through a 40-1m cell strainer and cells were cultured in 
DMEM with 15% fetal calf serum and antibiotics (P/S, fungizone). Although rare 
valve endothelial cells were present at PO, only cells with a fibroblastic phenotype 
(VICs) remained following P1-2. For all experiments, these valvular interstitial 
cells were used before passage 5. For histology: valves were fixed in formalin, 
embedded in paraffin and sectioned at 5 jm. Movat’s Pentachrome histological 
stain was performed using standard procedures. 

Cell culture studies. Wild type, p.P197L, p.R2513H and p.P197L/R2513H 
DCHS1 constructs were either synthesized by Integrated DNA Technologies or 
generated by site-directed mutagenesis (as described above), with an amino- 
terminal V5 epitope tag. Except where indicated, “mutant DCHS1” indicates the 
double mutant p.P197L/p.R2513H haplotype in family 1. These constructs were 
expressed in mycoplasma-free HEK293 cells (ATCC, not independently authen- 
ticated) using cationic lipid-mediated transient transfection (Lipofectamine LTX, 
Invitrogen). Protein expression of transfected HEK cells was measured by quan- 
tifying western blots using an antibody to the V5 epitope tag (Invitrogen). Patient 
cells: for patient cells, control and p.R2330C valvular interstitial fibroblasts from 
posterior leaflets were plated at 2.5 X 10* cells in a 24-well dish. 24h later, protein 
stability experiments were performed. Protein stability experiments involved addi- 
tion of cycloheximide 24 h after transfection (WT and p.R2513H transfectants) or 
plating (control or p.R2330C patient cells). For the cycloheximide experiments, 
media containing 100 ng ml ' of cycloheximide was added at 24h post-transfec- 
tion and each well was harvested as above at the indicated time points. Western 
blots were probed with either a mouse anti-V5 primary antibody (1:4,000 dilution 
Invitrogen) or a rabbit anti-Dchs1 antibody**(1:1,500 dilution), and an HRP- 
linked secondary at the same dilution (Thermo Scientific). Blots were also probed 
with a mouse anti-tubulin primary (Millipore) at a 1:4,000 dilution, and the same 
secondary antibody as above. Blots were treated with Pierce ECL Substrate and 
visualized on film. For quantitation, blot pixel intensity was measured by Image] 
(NIH), and normalized to tubulin. Each sample was run in triplicate, and a regres- 
sion curve was fit and half-lives calculated using SigmaPlot 12. 

Mouse studies. Dchs1 mice and genotyping were previously described”. All mice 
were blinded for genotype. Following phenotypic analyses, the genotypes of each 
sample were matched with the experimentally determined data sets. Histology, 
mitral valve analyses (echocardiography, MRI, and morphometric determina- 
tion), and expression studies were performed on embryonic and adult (9-month 
male) wild-type (Dchs1 Pe *), heterozygote (Dchs1 */) and knockout (Dchs1~’~) 
hearts (C57/Bl6;Sv129 mixed background). For histology: fetal (E17.5) and adult 
(9-month) hearts were processed for haematoxylin and eosin stainings and 
immunohistochemistry (IHC) as previously described*’. For fetal analyses, 
Dchs1*!*, Dchs1*'~, and Dchs1~/~ mice were analysed (n = 5 per genotype). Due 
to neonatal lethality of the Dehs1~'~ mice and loss-of-function Dchs1 mutations in 
humans, adult analyses were restricted to Dchs1 *’* (1 =7)and Dehs1*’~ (n=5). 
For all analyses male mice were used. Antibodies used for IHC were: hyaluronan 
binding protein (HABP) to stain proteoglycans (1:100) (Calbiochem), collagen I 
(1:100) (MDbio), and Hoescht to stain nuclei (1:10,000) (Invitrogen). AMIRA 3D 
reconstructions were performed to generate volumetric measurements of fetal 
(E17.5) anterior and posterior mitral leaflets (n = 5 for each genotype) as prev- 
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iously described’*. Length and width measurements of the mitral leaflets were 
obtained from histological sections. 25 consecutive 5 1m sections from anterior 
and posterior mitral leaflets of each genotype were used for measurements 
(Dchs1 cee n= 4, Dchs1 aan n=7, Dehs1’, n= 5). ImageJ software was used 
to measure the length of anterior and posterior leaflets from annulus to tip and the 
perpendicular width at the base, mid-region and tip. Measurements were com- 
pared to wild-type data to generate fold change and statistical significance 
(P < 0.01) was calculated using a Student’s f-test. For quantification of mitral valve 
interstitial cell alignment: hearts were initially dissected from E17.5 fetuses. The 
apex of the heart was dissected and discarded followed by removal of the left 
atrium. A cut was made cranial to caudal along the anterior aspect of the myocar- 
dium. The left ventricle was reflected to gain visualization of the leaflets. The left 
ventricle and interventricular septum were pinned such to stretch the papillary 
muscle and the chords. This resulted in obtainment of the leaflet as a planar sheet of 
tissue. The tissue was fixed in this position to ensure the leaflet was maintained in 
this orientation, as failure to do so results in curling of the leaflet making measure- 
ments and plane of orientation inconsistent between animals. Once the planar 
leaflet tissue was fixed in 4% PFA for 5 min, the leaflet was released from the heart 
by cutting the chords and dissecting along the annulus fibrosae. The tissue was 
then processed through normal protocols and placed en-face on paraffin. This 
technique was performed blinded to genotype and performed in exactly the same 
manner for all valve isolates. Performing this type of dissection and tissue proces- 
sing ensures that all leaflets are placed in nearly identically oriented planes. The 
vector maps and quantification of cell alignment were performed blinded by two 
independent researchers. Only after the data was generated did a third researcher 
perform the PCR genotyping. Vector maps were manually generated as previously 
described”. Cells that deviated >10 degrees from an average alignment plane were 
counted as misaligned. Interstitial cells within anterior leaflets from each genotype 
were measured. Total number of cells measured were: WT = 1,083, (n = 4); 
Het = 1,118, (n = 4); KO = 1,953, (n = 4). Statistical significance was calculated 
using a Students t-test with a P value < 0.05 being significant. Very little variation 
existed between the independent valves of each genotype as is graphically depicted. 
For mouse echocardiography the Vevo2100 imaging system (VisualSonics, 
Toronto, Canada) was used with 22-55 MHz linear transducer probe (MS550D) 
and used for 2-D B-mode and M-mode analysis. Heart rate was maintained at 400- 
500 bpm via isoflurane anaesthesia. The mitral valve leaflet was visualized and its 
function was assessed in parasternal long-axis B-mode view by placing the trans- 
ducer on the left lateral chest wall. End-systolic and end-diastolic left ventricular 
dimensions and wall thicknesses were measured according to the American 
Society of Echocardiography guidelines as applied to mice. Left ventricular wall 
thickness was measured at the level of interventricular septum and the posterior 
wall. Left ventricular volume was calculated from Simpson’s method of disks and 
ejection fraction determined from the formula (left ventricular end-diastolic-end- 
systolic volume)/(left ventricular end-diastolic volume). Offline image analyses 
were performed using dedicated VisualSonics Vevo2100 1.2.0 software. Mitral 
valve prolapse was determined based on superior systolic displacement of one 
or more leaflets above the line connecting the annular hinge points in the long- 
axis view (n = 6 per genotype). For MRI experiments: 9-month-old male Dchs1 ne 
and Dchs1*’~ mice (n = 4) were sacrificed and the hearts were perfusion fixed 
and immersed 1:40 (12.5 mmol) Gadolinium (ProHance) in 10% formalin over- 
night before imaging. MRI was undertaken at 7T using a Bruker Biospin console 
(Pavavision 5.1) with a volume transmitter coil and a phased array surface 
coil. Gradient echo FLASH 3D images were collected with repetition time/echo 
time = 50 ms/5.4ms, flip angle = 30°, number of excitations = 3, matrix = 
256 X 256 X 256 and pixel resolution = 55 X 55 X 59 um. Images in DICOM 
format were imported into AMIRA 3D reconstruction software and volume 
quantification were performed as described previously**. Pairwise comparison 
of littermates were performed and statistical significance was determined 
(Student’s ¢-test) with P< 0.01. All mouse experiments were performed under 
protocols approved by the Institutional Animal Care and Use Committee, 
Medical University of South Carolina. Prior to cardiac resection, mice were 
euthanized in accordance with the Guide for the Care and Use of Laboratory 
Animals (NIH Publication No. 85-23, revised 1996). 

RNA expression analyses of Dchs1 and Apbb1. Section in situ hybridization was 
performed as previously described” on 4 embryos at each time point to localize 
Dchs1 expressing cells throughout cardiac development. A Dchs1 digoxigenin- 
labelled riboprobe (Roche) was generated against region 9222-10180 of accession 
number NM_00162943 and used for in situ hybridization at E11.5, E13.5, and 
E15.5. RNA in situ hybridization for Apbb1 at E14.5 was performed through 
GenePaint"*. Two separate riboprobes were used to analyse Apbb1 RNA express- 
ion at E14.5. These probes were generated against regions 1676-2506 and 
370-1967 of accession number NM_001253885.1. These probes span all known 
isoforms for Apbb1 and provide similar spatial RNA expression patterns. 
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Protein expression. Dchs1 antibodies were generated by immunizing rabbits with 
a synthetic peptide corresponding to rat Dchs1 protein sequence: CSTYMVES 
PDLVEADSAA (region 1308-1324 of accession number NP_001101014)*. 
Immunohistochemistry was performed as described previously’’ using a 1:100 
dilution of primary antibody. 

In vivo lineage trace. To trace the fate of epicardially derived cells in Dchs1 
and Dchs1*’~ mitral leaflets, the Wt1/IRES/GFP-Cre mouse”® was bred with the 
Dchs1‘’~ and Dehs1*/* mice (n= 4 per genotype). Mice were euthanized at 
neonatal day 0 (PO), hearts were isolated and fixed overnight at 4 °C in 4% para- 
formaldehyde dissolved in PBS. Hearts were processed through a series of graded 
ethanol, cleared in toluene, and embedded in Paraplast Plus (Fisherbrand, 23-021- 
400). Hearts were sectioned at 5 jum and slides were treated with 15 ml of antigen 
unmasking solution (Vector Biolabs, H-3300) in 1,600 ml of distilled water for 
10 min in a pressure cooker (Cuisinart) followed by incubation for 1h at room 
temperature with 1% BSA (Sigma, B4287) in PBS. Expression of EGFP after Cre 
recombination was detected by immunofluorescence using antibodies against 
GFP (Abcam, 13970) and myosin heavy chain (MF20; DSHB). 5 um sections 
throughout the entire valve were used for 3D reconstructions using Amira soft- 
ware. The volume of GFP positive cells and the volume of each mitral leaflet were 
measured using this software. Cell counting was done on GFP positive and GFP 
negative cells every 15 um throughout the entire valve. Pairwise comparison of 
littermates were performed and statistical significance was determined (Student’s 
t-test) with P = 0.04 for posterior leaflet and P = 0.86 for anterior leaflet. 

In vitro migration. Human mitral valve interstitial cells were isolated from a 
control and the patient with the DCHS1 mutation (p.R2230C) (proband family 2) 
and seeded into the Radius 24-well Cell Migration Assay plate containing hydro- 
gels (Cell Biolabs, CBA-125). Cells were allowed to adhere overnight and then gels 
were dissolved. Wells were imaged over a period of 24h and area of the cell free 
region was measured in Photoshop v.10.0.01 and subtracted from the initial area 
of the hyedropel to generate area migrated over time. Migration in the Dchs1‘’~ 
and Dchs1*’* mice was assessed by explanting PO neonatal posterior mitral 
leaflets onto plastic. Images of the explants and migrating cells were captured 
at multiple time points. Distance migrated was measured as the distance from the 
explant to the farthest migrating cell. Measurements were taken at 5 points 
around the explant and averaged to calculate distance migrated. After 24h, cells 
were fixed in ice-cold 100% methanol for 10min and immunofluorescence 
was performed using an antibody against N-cadherin (1:1,000 dilution, BD 


+/+ 


Transduction Labs, 610920). Pairwise comparisons were performed and statist- 
ical significance was determined (Student's t-test) with P< 0.05. 

Statistical considerations. In all experiments sample sizes were chosen to provide 
power of 0.8 to detect biologically significant differences between test groups with 
two-sided « = 0.05. Specific statistical tests are listed in the methods for each 
individual experiment. Assumptions of normal distributions were made for 
quantitative biological measurements and comparison groups were assumed to 
have similar variances. For zebrafish experiments, fertilized oocytes were ran- 
domly selected within each clutch for injection with active compound versus 
controls. Mouse experiments were interpreted blinded to genotype. 
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Extended Data Figure 1 | Measurement of endogenous and exogenous gene _ knockdown of each indicated gene results in reduced mRNA expression, after 
expression in D. rerio. a, Corresponding representative embryos of normalization to beta-actin expression, compared to mock-injected controls 
each morpholino knockdown on the left, with close-up of heart on the right. —_ (two-sided Student’s t-test). P values are noted on graphs. c, Western blotting 
b, To assess efficiency of morpholino knockdown, 20 embryos were collected of 20 pooled embryos injected with DCHS1 mRNA demonstrates the 

72 h after injection, mRNA was collected, and quantitative PCR was performed _ production of protein. Mutant mRNA refers to the compound mutant 

with three technical replicates. We demonstrate that morpholino (MO) P197L/R2513H. 
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Apbb1 RNA Expression 
E14.5 


Probe 1 Probe 2 


Extended Data Figure 2 | Apbb1 is not expressed during cardiac expression is observed for Apbb1, no detectable cardiac expression or valve 
morphogenesis. Apbb1 RNA expression was analysed at E14.5 in sagittal expression (arrow) is evident. 
sections using 2 separate antisense probes. Whereas strong cranial and neural 
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Extended Data Figure 3 | Dachsous1b expression at the atrioventricular expression is purple while a counterstain for cardiac tissue is brown (a). White 
junction. In situ hybridization reveals the presence of dchs1b in the arrows highlight the dchs1b signal in the atrioventricular canal. 
atrioventricular canal (avc) at 54hpf (a, b) and 72 hpf (c). The dchs1b 
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Extended Data Figure 4 | Dachsous1b knockdown alters atrioventricular embryos at 48 hpf (b), spp1 and notch1b expression was largely unperturbed 
ring markers. In situ hybridization at 48 hpf and 72 hpf, as indicated, was (c-f), and has2 expression was not detected at 48 hpf, and is faint at 72 hpf in 
performed for known atrioventricular ring markers. In contrast to WT dchs1 knockdown, compared to identically handled and stained controls (i-I). 
(a) bmp4 expression is expanded into the ventricle at 48 hpf in dchs1 knockdown 
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Extended Data Figure 5 | Histopathology mitral valves. Human posterior p-R2330C leaflets compared to controls. Expansion of the proteoglycan layer 
leaflets of control, Barlow’s with MVP, and DCHS1 p.R2330C were isolated, (blue) and disruption of the normal stratification of matrix boundaries is 
fixed and stained with Movat’s pentachrome. Leaflet thickening, elongation observed in the Barlow’s and DCHS1 p.R2330C leaflets. Blue, proteoglycan; 
and myxomatous degeneration is observed in the Barlow’s and DCHS1 yellow, collagen; black, elastin; red, fibrin or cardiac muscle. Scale bars, 0.5 cm. 
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% Difference in 
Protein Level 


Mock DCHS1 Mutant P197L R2513H 


Extended Data Figure 6 | Protein expression of uncoupled mutations. In _— mutation causes a significant decrease in DCHS1 protein expression, similar to 
order to determine which family 1 DCHS1 mutation is leading to the observed __ that of the construct with both variants (mutant), suggesting pathogenicity. 
decrease in protein expression, constructs were generated that harboured Percent difference in protein levels is depicted. Normalization of data was 
only the p.P197L or the p.R2513H variant. Mutant refers to the double mutant accomplished by qPCR specific to the transfected constructs. P values from 
P197L/R2513H construct. Western blot analyses from transfected HEK293 the Student’s t-test are indicated in graphs. 

cells, three independent biological replicates, demonstrate that the p.R2513H 
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Extended Data Figure 7 | Cardiac function is not altered in Dchs1*'~ mice. 
M-mode analyses were performed to determine whether cardiac structure and/ 
or function were perturbed in the Dchs1*’~ mice. No statistically significant 
differences were observed in either cardiac structure or calculated cardiac 
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M-Mode 
M-Mode 
M-Mode 
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Cardiac Function (Echocardiography) 


Parameter Units 


Depth 
Depth 
Depth 
Depth 
Depth 
Depth 
Units 
% 


mm 
mm 
mm 
mm 
mm 
mm 


0.8208935 
1.290918667 
4.282772333 
2.887339833 
0.914783667 
1.293790333 


60.5315595 
32.317489 
148.9330448 
119.1464358 
83.76315083 
32.456727 


Dchs1 +/+ (N=6) STD 


0.187955 
0.243251 
0.515571 

0.33268 
0.210393 
0.237193 


7.808571 
5.734886 
42.74011 
34.19209 
23.58335 
9.786065 


Dchs1 +/- (N=6) STD 


0.9096155 
1.369591167 
4.045031833 
2.735737167 

0.961298 
1.385096 


61.0498945 
32.5702855 
150.7559685 
120.6047747 
72.14675683 
28.5946545 


0.140668 
0.200513 
0.217379 
0.362674 
0.215238 
0.238861 


8.235624 
5.939884 
40.75343 
32.60275 
9.354702 
8.983466 
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p-value 
0.170463 
0.393879 
0.28511 
0.544435 
0.691322 
0.252632 


0.934765 
0.955876 
0.891865 
0.891865 
0.261482 
0.570045 


function (n = 6 for each genotype). IVS, interventricular septum; d, diastole; s, 
systole; LVID, left ventricular internal dimension; LVPW, left ventricular 


posterior wall; EF, ejection fraction; FS, fractional shortening; LV, left ventricle. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 
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Extended Data Figure 8 | Dchs1 expression during cardiac development. is observed in the forming anterior and posterior mitral leaflets (AL and PL, 
Top, RNA expression of Dchs1 was analysed during embryonic gestation respectively). Bottom, Dchs1 protein expression (red) is observed throughout 
(E11.5, E13.5, and E15.5) by section in situ hybridization. AtE11.5 Dchs1 RNA cardiac development in the endothelial cells and interstitial cells of the 

(blue staining) expression is observed in the endocardium and mesenchyme developing valves. Dchs1 shows asymmetric expression in the valvular inter- 
of the superior and inferior cushions (sAVC and iAVC, respectively). stitial cell bodies around E15.5 (arrowheads). Dchs1 protein is also observed in 
A gradient pattern of expression is observed at this time point with more the epicardium and atrioventricular sulcus (arrows). (Red Dchs1; green MF20; 


intense expression near the endocardium. At E13.5 and E15.5, a similar pattern — blue Hoescht). 
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Extended Data Figure 9 | Dchs1 deficiency causes altered valvular 
interstitial cell patterning in vivo. a, IHC for eGFP of postnatal day 0 (PO) 
lineage traced Wt1-Cre/Rosa-eGFP/Dchs1*’* neonatal mice show epicardial- 
derived cells (EPDCs) migrating into the posterior leaflet as a sheet of cells 
directly under the endothelium of the atrialis. This normal patterning is 
perturbed in the Wt1-Cre/Rose-eGFP/Dchs1‘’~ mice. 3D reconstructions were 
used to examine all EPDCs in the posterior leaflet of both genotypes to 
obtain a complete fate map of these cells. b, c, Total volume of the leaflet is 
unchanged at this time point. However, the total volume of EPDCs as well 

as total EPDC cell number is significantly increased. There is a significant 
decrease in the number of non-EPDCs in the posterior leaflet with no overall 
change in total cell number. These data demonstrate that a minimum threshold 
of Dchs1 expression is required for normal migration of EPDCs into the 
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posterior leaflet, normal patterning of this cell population, and cross-talk 
between EPDC and non-EPDC cell types in the valve. **P < 0.01. d, Isolated 
anterior mitral leaflet from fetal (E17.5) Dchs1 */*" Dehs1*/, and Dchs1~/— 
mice were used to quantify cellular alignment of valvular interstitial cells. 
Vector maps were generated from histological (haematoxylin and eosin) stains 
to show orientation and alignment of cells in relationship to each other. 
Boxes in each vector map panel are represented as zoomed images of regions 
within each of the valves to show cell orientation. e, Cell alignment and polarity 
were quantified as the number of cells that deviate >10 degrees from the 
proximal-distal (P-D) axis of the leaflet. 90% of the cells in Dchs1*’* show 
proper alignment with each other and along this P-D axis. Haploinsufficiency 
(Dchs1 #15 results in a 50% reduction in cell alignment, which is further 
reduced in Dchs1~/~ (*P values < 0.01). 
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Extended Data Figure 10 | Mice and MVP patients with Dchs1 deficiency _ valvular interstitial cells (arrows), this membrane expression is lost in the 


exhibit migratory defects in vitro. a, Posterior leaflets of PO neonatal Dchs1‘’~ cells and is prominently expressed in the cytoplasm (arrows). 
Dchs1*’* and Dchs1*’~ mice were explanted and interstitial cells were allowed — Nuclei, blue. b, Migration assays using control and MVP patient (p.R2330C) 
to migrate out for 24h. Dchs1*/~ mice exhibit increased migration (black valvular interstitial cells exhibit a similar affect as observed in the mouse cells 
lines drawn from explants) coincident with loss of cell-cell contacts and whereby the p.R2330C cells exhibit an increase in migration. P values are 
N-cadherin expression at focal adhesions. Whereas N-cadherin expression indicated in graphs. 


(red) is found at the membrane at points of cell-cell contract in Dchs1*/* 
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PIK3CA™'°*”8 induces multipotency and 
multi-lineage mammary tumours 


Shany Koren!, Linsey Reavie', Joana Pinto Couto!, Duvini De Silva’, Michael B. Stadler’, Tim Roloff', Adrian Britschgi', 
Tobias Eichlisberger’, Hubertus Kohler’, Olulanu Aina’, Robert D. Cardiff* & Mohamed Bentires-Alj! 


The adult mouse mammary epithelium contains self-sustained cell 
lineages that form the inner luminal and outer basal cell layers, 
with stem and progenitor cells contributing to its proliferative and 
regenerative potential’ *. A key issue in breast cancer biology is the 
effect of genomic lesions in specific mammary cell lineages on 
tumour heterogeneity and progression. The impact of transform- 
ing events on fate conversion in cancer cells of origin and thus their 
contribution to tumour heterogeneity remains largely elusive. 
Using in situ genetic lineage tracing and limiting dilution trans- 
plantation, we have unravelled the potential of PIK3CA®16"8, one 
of the most frequent mutations occurring in human breast cancer’, 
to induce multipotency during tumorigenesis in the mammary 
gland. Here we show that expression of PIK3CA™!’® in lineage- 
committed basal Lgr5-positive and luminal keratin-8-positive cells 
of the adult mouse mammary gland evokes cell dedifferentiation 
into a multipotent stem-like state, suggesting this to be a mech- 
anism involved in the formation of heterogeneous, multi-lineage 
mammary tumours. Moreover, we show that the tumour cell of 
origin influences the frequency of malignant mammary tumours. 
Our results define a key effect of PIK3CA™'®”® on mammary cell 
fate in the pre-neoplastic mammary gland and show that the cell 
of origin of PIK3CA™'”® tumours dictates their malignancy, 
thus revealing a mechanism underlying tumour heterogeneity 
and aggressiveness. 

The mammary gland epithelium is composed of two major cell 
lineages: the luminal layer contains cells expressing keratin 8/18 
(K8/18) and the basal layer with cells expressing K5/14 and/or smooth 
muscle actin (SMA) and/or p63 (ref. 6). 

Multipotent cells that generate both the luminal and basal lineages 
are found in the mouse embryonic mammary gland’” but their exist- 
ence in the adult gland is still under debate. Studies using serial trans- 
plantation into cleared mammary fat pad supposed the existence in the 
adult mouse mammary gland of multipotent stem cells with myo- 
epithelial features*“’. Arguably, these assays reflected the regenerative 
potential of the transplanted cells rather than their properties in situ’. 
Lineage-tracing studies, which permit targeted expression of a fluor- 
escent reporter in a given cell and its progeny, showed that tissue 
homeostasis is maintained by unipotent luminal K8/18-positive and 
basal K5/14/Lgr5-positive stem cells after birth. Lineage tracing of 
K8/18, K5/14 and Lgr5 progeny found no evidence for the presence 
of multipotent stem cells in the adult mammary gland' but did not 
exclude the possibility that rare cells not targeted by these reporters, or 
only at a very low frequency, have multipotent potential. While tracing 
of the progeny of axin-2-positive cells showed the presence of multi- 
potent stem cells during puberty and pregnancy, this and other studies 
revealed that the basal and luminal lineages are self-sustained in the 
adult virgin gland**. By contrast, recent three-dimensional whole- 
mount imaging” and the identification of the Procr-positive subset'* 
argue for the presence of multipotent stem cells in the adult virgin 
mouse mammary gland, thus reopening the debate. 


The phosphatidylinositol 3-kinase (PI3K) pathway is activated in 
~70% of breast cancers. Several mechanisms may account for the 
activation of this pathway in cancer, including amplification and/or 
activating mutations of the PIK3CA gene that encodes the p110a 
catalytic subunit of PI3K found in 20-40% of breast cancers”"*. The 
most recurrent mutation, H1047R, leads to constitutive PI3K signal- 
ling and heterogeneous mammary tumours’*'’. Despite frequent 
alterations of the PI3K pathway in breast cancer, its impact on lineage 
organization during tumorigenesis and the importance of the cell of 
origin for heterogeneity and aggressiveness of PI3K-driven tumours 
has remained unclear. 

To address the effects of mutant PIK3CA™'®*’8 on basal- or luminal- 
lineage-restricted cells, we performed lineage tracing in adult Lgr5- 
CreER'*/Tomato-reporter and K8-CreER’?/Tomato-reporter mice 
with or without PIK3CA™!7® (Extended Data Fig. 1a)'®. Moreover, 
we used Lers- and K8-CreER'*/PIK3CA"'°*”® or PIK3CA wild-type 
(PIK3CA’") animals'*'® for tracing of the green fluorescent protein 
(GFP) reporter, and Lgr5-CreER™” and K8-CreER™ mice as controls 
(Extended Data Fig. 1b). As previously reported’"’, we found Lgr5 
activity only in a subset of basal cells in the nipple area of the mam- 
mary gland (Extended Data Fig. 2a—e). We further assessed the effects 
of PIK3CA™'°”® expression on the distribution of mammary subpo- 
pulations by fluorescence-activated cell sorting (FACS) on isolated 
mammary epithelial cells labelled with CD24 and Scal, which were 
shown to enrich for luminal (CD24"™Scal1~, CD24"'Scal*) and basal 
(CD24'°Scal7) cells!®° (Extended Data Fig. 3). Four days lineage 
tracing confirmed Tomato labelling of basal cells in Lgr5-CreER**/ 
Tomato and Lgr5-CreER'/PIK3CA"'°*”®/Tomato and of luminal 
cells in K8-CreER'’/Tomato and K8-CreER'/PIK3CA™!7®/ 
Tomato animals (Extended Data Figs 2b, f and 4a, b). We further 
performed 4-, 8- and 13-week lineage tracing. The distribution of 
Tomato-labelled subsets in Lgr5-CreER'/Tomato glands did not 
change with time where the progeny of Lgr5-positive cells was mostly 
of basal origin (Fig. 1a, b and Extended Data Fig. 2g, h). In K8-CreER"7/ 
Tomato mice, labelling was restricted to the luminal subset, marking 
mostly mature CD24'"Scal* in 4 weeks and mostly CD24™Scal~ 
luminal progenitors in 13 weeks tracing, indicating the targeting of a 
long-term unipotent luminal subset (Fig. 1a, c and Extended Data Fig. 
4c, d). By contrast, expression of PIK3CAO478 in Lgr5-CreER'”/ 
PIK3CA™'’8/Tomato and K8-CreER'*/PIK3CA"!°*”®/Tomato mice 
resulted in labelling of both the luminal and the basal compartments 
(Fig. 1b, c and Extended Data Figs 2g, h, 4c, d). PIK3CA™!78_evoked 
multi-lineage labelling was not observed at 4-7 days after tamoxifen 
induction (Extended Data Figs 2f, i and 4b). Since PIK3CAHOYRWT 
targeting vectors contain an internal ribosome entry site (IRES)-GFP 
construct, we also used GFP as a readout of transgene expression. 
At 4 days after tamoxifen induction, K8-CreER'?/PIK3CA™ and 
K8-CreER'?/PIK3CAH1O77 glands expressed similar levels of GFP, 
indicating similar Cre recombination efficiency. In both models, 
4- and 8-11-week lineage tracing revealed an increase in GFP-labelled 


1Friedrich Miescher Institute for Biomedical Research (FMI), 4058 Basel, Switzerland. Swiss Institute of Bioinformatics, 4058 Basel, Switzerland. 7Department of Pathology, Center for Comparative 


Medicine, University of California Davis, Davis, California 95616, USA. 


114 | NATURE | VOL 525 | 3 SEPTEMBER 2015 


©2015 Macmillan Publishers Limited. All rights reserved 


s 


LETTER 


K8/18 Tomato K14 Tomato K8/18 Tomato K14 Tomato K8/18 Tomato K14 Tomato 
Tamoxifen (3 x 2 mg) 
> 4 weeks 
781115 20 & 6 
Ht > Age (week) ua 
> Weeks tracing GO £ 
4 813 oe 
i L 
° 
a] 
Te 4 weeks 4 weeks 8 weeks 
sae) 
o 
Ge {J 
Of 
5 3 
88 
N4 — 
a 
£5 4 £5 * 25 * 
B E100) % & 100) — B E100) Lgr5-CreER™ 
Se 80 a Se 80 Se 80 
a5 a5 a5 ° id @ Tomato 
£3 60 £3 60 eo - £3 60 2 
28 49 7 as A 2£ gol | + A PIK3CA0478/Tomato 
88 5 ge” } 88 A ee 
+ @ 20 aa NS C2 0 a = 8 ® 49 ue 
ok ry al os 2 os ria ao 
#20 ook Sa ee g2 0 as 
ES Basal Luminal Luminal ES Basal Luminal Luminal ES Basal Luminal Luminal 
Ls cp24e CD244' CD24Hi = cp24'° cD24Hi CpD24Hi = cbd24'e CD244i CD24ti 
Scai- Scat-  Scat* Scat- Scal-  Scat* Scat- Scat- Scat* 
c = Tomato Tomato K8/18 Tomato AN Tomato Tomato 
me 4 weeks 4 weeks z3 ° ae 8 weeks 
a 
fg 
° 
: g al 
fe 
Ge 
ee) 
x 
g' = 
ae = 4 weeks 8 weeks 
are] 
oo 
we 
O¢ | 
os 
29 
3) 
x< 
a 
2) 2) n 
g E10 gE te 
2 g0 a 2 = 80 NS  K8-CreER™ 
Z& Z& Z& e — 
£260 e2 22 60 - $ @ Tomato 
2S 40 2s 2s 40 ‘| wa A PIK3CA"'9478/Tomato 
oa oa oa * e oy 
ee ao ed Mn lig 
& & o e 
8 L 28 ge Oleel 
ES Basal Luminal Luminal ES Basal Luminal Luminal ES Basal Luminal Luminal 
- cp24te cp244'  cp24ti F cp24'e cp244'  cpa24ti F cp24'e cp244! cp2at 
Scait- Scal- Scat* Scai- Scal-  Scat* Scait- Scal- Scat* 


Figure 1 | Mutant PIK3CA induces mammary cell plasticity. a, Timeline for 
lineage-tracing studies. b, c, Representative images of 4-, 8- and 13-week tracing 
and FACS quantification of Tomato-positive epithelial basal (CD24'°Scal_ ) 
and luminal (CD24""Scal~/*) subsets from Lgr5-CreER™’/Tomato 

(b, immunofluorescence: left n = 3, middle n = 11, right n = 6 mice; FACS: left 
n = 4 technical replicates (each 1-3 pooled mice), middle n = 3 technical 
replicates (each 1-2 pooled mice), right n = 3 technical replicates (each 1 
mouse)), Lgr5-CreER'*/PIK3CA™'°”®/Tomato (b, immunofluorescence: left 
n= 3, middle n = 10, right n = 4 mice; FACS: left n = 3 technical replicates 
(each 1-2 pooled mice), middle n = 6 technical replicates (each 1 mouse), right 
n=5 technical replicates (each 1-2 pooled mice)), K8-CreER'?/Tomato 


basal and luminal subsets in PIK3CA"!°*”® compared with PIK3CA* 
animals that showed lineage-restricted GFP labelling, consistent with 
higher PI3K pathway activation (Extended Data Figs 5 and 6). These 
results suggest that expression of PIK3CA‘'°*’® in basal- or luminal- 
restricted mammary cells triggers lineage plasticity and cell expansion. 

Next, we assessed the effects of PIK3CA™'°”® on global gene 
expression in the pre-neoplastic gland. Microarray and quantitative 
polymerase chain reaction with reverse transcription (qRT-PCR) 
analyses were performed on mammary epithelial cell subpopulations 
from Lgr5-CreER™ control versus Lgr5-CreER'*/PIK3CA‘'°*”® and 
K8-CreER™ control versus K8-CreER'’/PIK3CA"'7® animals. 
Genes expressed differentially between subpopulations of control and 
mutant mice were compared with subpopulation signatures that have 
been previously described****. We found enrichment of luminal pro- 
genitor signature genes in the basal Lgr5-CreER'*/PIK3CA''™”® sub- 
set and in the newly formed basal K8-CreER'’/PIK3CA™'°*”® subset. 
Enrichment of myoepithelial signature genes was found in the newly 


(c, immunofluorescence: left n = 4, middle n = 8, right n = 5 mice; FACS: left 
n= 4 technical replicates (each 1-3 pooled mice), middle n = 4 technical 
replicates (each 1-2 pooled mice), right n = 5 technical replicates (each 1 
mouse)) and K8-CreER'?/PIK3CA''78/Tomato animals (c, immunofluo- 
rescence: left n = 4, middle n = 4, right n = 3 mice; FACS: left n = 4 technical 
replicates (each 1-2 pooled mice), middle n = 4, right n = 3 technical 
replicates (each 1 mouse)). White arrowheads indicate luminal and yellow 
arrowheads indicate basal Tomato-labelled cells. Scale bars, 100 tm, 20 um 
(magnifications). Bar graphs show means + standard error of the mean 
(s.e.m.); two-sided unpaired Student’s t-test; *P < 0.05; NS, not significant. 


formed luminal Lgr5-CreER'*/PIK3CA‘™”® and in the K8-CreER'”/ 
PIK3CA™'°*’8 juminal subsets (Fig. 2a, b and Extended Data Fig. 7). 
Mammary cells co-expressing basal and luminal markers in neoplastic 
areas confirmed that PIK3CA"1°”® induces cell plasticity (Fig. 2c). 
We addressed how lineage plasticity is evoked by PIK3CA™!*”® in 
functional assays. Limiting dilution transplantation revealed an 
increase in the mammary-repopulating capacity of basal cells from 
Lgr5-CreER"?/PIK3CA™'*”® and K8-CreER™*/PIK3CA"'”® mice 
compared with the respective controls (Fig. 3a). The GFP-negative 
basal CD24'°Scal~ population from K8-CreER'* mice comprises 
the bulk of basal cells (myoepithelial and mammary-repopulating 
cells)'°”°, explaining the lack of outgrowths at the number of cells 
transplanted. PIK3CA™'™*’®- expressing luminal cells also had repopu- 
lating capacity (Extended Data Fig. 8a, b). All outgrowths expressed 
luminal and basal markers (Fig. 3b and Extended Data Fig. 8c). In 
colony formation assays, PIK3CA"!”® increased the percentage of 
double-positive (K14/K8/18) colonies derived from the newly formed 
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Figure 2 | Activation of PIK3CA™'*”® leads to expression of basal- and 
luminal-lineage genes. a, b, Plots indicating enrichment of gene expression 
from FACS-sorted Lgr5-CreER™/PIK3CA"'°*”® versus Lgr5-CreER” 
control (a) and K8-CreER™/PIK3CA™!™’® versus K8-CreER” control subsets 
(b) in signatures of mammary subpopulations from refs 21, 22. Microarray 
was performed 4 weeks after tamoxifen induction on basal CD24"°Scal~ and 
luminal CD24™Scal~/* subsets of pooled mammary glands of 2-3 oestrus- 
synchronized animals from three independent sortings. c, Representative 
images of immunostaining for basal (K14, red; K5, white) and luminal (K8/18, 
green) markers on mammary glands 4 weeks after tamoxifen treatment (n = 3 
mice). White arrowheads indicate double-positive cells. Scale bars, 100 um, 
20 um (magnifications). 
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luminal (Lgr5-CreER'?/PIK3CA™ 1°78) and basal cells (K8-CreER™”/ 
PIK3CA™'°*”®), PIK3CA‘'°78- expressing subsets overcame lineage 
restriction, giving rise to both lineages, albeit at low frequencies. 
Moreover, basal cells from Lgr5- and K8-CreER??/PIK3CAM1OV7R 
animals showed increased colony formation capacity (Extended Data 
Fig. 8d-f). In mammosphere cultures, PIK3CA™!”8 increased the 
sphere-forming capacity of luminal cells. While control cells formed 
spheres with a hollow lumen after passaging, PIK3CA™'™”8- expressing 
cells formed filled spheres, indicating the accumulation of less differ- 
entiated cells (Extended Data Fig. 8g, h). Altogether, these data suggest 
that PIK3CA™1°*”® evokes cell dedifferentiation to a multipotent stem- 
like state, from which cells further differentiate into both cell lineages. 

Expression of PIK3CA™'°”’® in Lgr5- and K8-positive cells induced 
mammary tumours on average after 108 and 78 days, respectively. 
Control and PIK3CA“7 animals developed no tumours (Fig. 4a). 
FACS analysis of GFP-positive tumour cells revealed a similar distri- 
bution of cancer subpopulations, with an accumulation of the 
CD24™Scal~ subset. PIK3CA"!’"-evoked mammary tumours 
expressed basal and luminal markers (Extended Data Fig. 9a-d). 
Additionally, cells double positive for basal and luminal markers were 
found (Fig. 4b). These results suggest that PIK3CA‘'°*”®-evoked cell 
plasticity results in multi-lineage tumours and that expression of basal 
and luminal markers is not an indicator of the origin of mammary 
cancers. It has been proposed that basal-like mammary tumours may 
originate from luminal cells****. Therefore, any inference of the cell of 
origin from the differentiation state of the tumour can be misleading. 

Histological analysis showed heterogeneous phenotypes and 
differences in malignancy between both models. Lgr5-CreER'/ 
PIK3CA™'°*”® mice formed unique benign multi-nodular rosette-type 
adenomyoepitheliomas, aggressive adenosquamous carcinomas with 
pilosebaceous differentiation and carcinosarcomas. K8-CreER™”/ 
PIK3CA"18 mice mainly developed aggressive adenosquamous 
carcinoma, carcinosarcomas that infiltrated the surrounding tissue, 
adenocarcinomas and benign adenomyoepitheliomas. These results 
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Figure 3 | Expression of PIK3 evokes multipotent stem-like cells. 
a, Number of outgrowths in cleared-fat-pad transplantation of GFP-positive 
(Lgr5-positive) control and PIK3CA™'”’8_ expressing basal subsets (top). 
Outgrowths of GFP-negative control and GFP-positive PIK3CA™'*”8- 
expressing basal subsets (bottom). The GFP-negative control basal subset 
comprises mammary repopulating cells and mostly myoepithelial cells. CI, 
confidence interval; MRU, mammary repopulating unit; N/A, not applicable. 
b, Representative immuno-stained sections (n = 3 mice) and carmine-stained 
whole mounts of outgrowths from cleared-fat-pad transplantation (100 cells 
condition). Scale bars, 50 jum (top three rows), 500 1m (bottom row). 

a, b, Pooled data from three independent experiments. 


show that PIK3CA™'°*”® mostly evokes benign tumours (74%) with 
high intratumour heterogeneity when expressed in Lgr5-positive cells, 
in contrast to the mostly aggressive mammary tumours (62%) with a 
distinctive infiltrative densely fibrotic phenotype seen when the cell of 
origin is K8-positive (Fig. 4c and Extended Data Fig. 9e, f). 
Microarray analysis, principle component analysis and hierarchical 
clustering of tumours revealed a single cluster of Lgr5-CreER"?/ 
PIK3CA"'°”® and three clusters of K8-CreER'?/PIK3CA‘107® 
tumours. We found no correlation between tumour phenotype and 
clustering, probably owing to intratumour heterogeneity. We com- 
pared tumour expression profiles to different human breast cancer 
subtypes’. The majority (7/10) of the K8-CreER'?/PIK3CA"1°7® 
tumours correlated best with the malignant basal-like, HER2-enriched 
and luminal-B profiles, whereas 3/10 clustered with benign luminal-A 
and normal-like breast cancers. Lgr5-CreER'*/PIK3CA"1°”® 
tumours resembled benign (2/10) but also malignant subtypes (3/10). 
However, 5/10 showed no similarity to a single subtype but were equi- 
distant from all, representing the high intratumour heterogeneity 
in this model (Fig. 4d and Extended Data Fig. 10). The fact that 
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K8-CreER'*/PIK3CA"'°® tumours but not Lgr5-CreER™?/ 
PIK3CA"*’® tumours clustered mostly with malignant breast can- 
cers that have a poor prognosis is consistent with the histopathologi- 
cal results and suggests that, in the presence of the same initiating 
oncogenic mutation, the cell of origin dictates the frequency of 
aggressive tumours. 

GFP driven by the Lgr5 promoter was found in a rare luminal subset 
of the control gland (Extended Data Fig. 5b, c), as shown previously’. It 
is unlikely that such rare cells expand due to PIK3CA"'°*”® and form 
the luminal GFP-positive population. We observed multi-lineage 
labelling upon PIK3CA"!’® expression in a basal and a luminal 
cell-driven model. An alternative possibility that we cannot firmly 
exclude is that Lgr5- and K8-positive populations contain rare bipo- 
tent subsets that are quiescent or not efficiently labelled in physio- 
logical conditions and are, therefore, not detected by lineage tracing 
but may expand upon PIK3CA™!”® expression. 

We show that expression of PIK3CA™!°”® dedifferentiates lineage- 
restricted epithelial cells into a multipotent stem-like state from which 
cells further differentiate, revealing a mechanism by which heterogen- 
eous mixed-lineage tumours may develop. Furthermore, we show that 
the tumour cell of origin influences the frequency of malignant mam- 
mary tumours. The fundamental questions of which mammary cells 
are susceptible to which combination of oncogenes and how this 
impinges on tumour progression and aggressiveness warrant further 
investigation. Understanding these dynamic relationships is para- 
mount for understanding tumour heterogeneity and for identifying 
prognostic and predictive biomarkers. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Mice. Lgr5-CreER™ mice (C57BL/6) were generated and provided by B. Kinzel 
and J. Tchorz’*. Mice were backcrossed to the FVB background and used in this 
study. K8-CreER™? mice (CD-1) were provided by C. Blanpain'. The generation of 
PIK3CA'178 and PIK3CA™" (pure FVB) was described previously'*"*. Tomato- 
reporter animals (C57BL/6) were provided by B. Roska. For lineage-tracing stud- 
ies, mice of a mixed background (FVB/CD-1/C57BL/6 and FVB/C57BL/6) were 
used. For studies without a Tomato reporter, mice with a pure FVB background 
(Lgr5-CreER™ model) or mixed FVB/CD-1 background (K8-CreER™” model) 
were used. Mouse colonies were maintained in the animal facility of the 
Friedrich Miescher Institute for Biomedical Research and experiments were car- 
ried out in accordance with Swiss national guidelines on animal welfare and the 
regulations of the canton of Basel-Stadt, Switzerland. 

Targeting Tomato and/or GFP expression. Adult 7- to 8-week-old female mice 
were induced by intraperitoneal injection of tamoxifen (Sigma; 2 mg per 25 g of 
body weight) for three consecutive days (diluted in sunflower seed oil, Sigma) to 
activate cell-specific expression of the Cre recombinase and thus expression of 
Tomato and/or the PIK3CA'7® °° WT transgene. Tomato and/or GFP were 
expressed in recombined cells derived from Lgr5-positive or K8-positive cells, 
respectively. For the Lgr5 model, after Cre induction, GFP expression was derived 
from both Lgr5 promoter activity and PIK3CA‘1”8 expression. 

Histology and immunostaining. Tumours and dissected mammary glands were 
spread on a glass slide and fixed in 20% formalin for 24h at 4°C. Samples were 
then processed, embedded in paraffin and sectioned (3 um). Immunofluorescence 
studies were performed using a Ventana DiscoveryUltra instrument (Roche 
Diagnostics) following the RUO Discovery Universal method. Briefly, slides were 
pretreated with CC1 for 40 min and then incubated with primary antibodies for 
1h at 37°C. After washing, secondary antibodies were incubated for 32 min at 
37 °C. Slides were then washed with reaction buffer (three times), PBS (two times) 
and then incubated with DAPI (1 ug ml ') for 5 min. Finally, slides were rinsed 
with PBS (three times) and mounted in Mount Fluor (ProTaqs). The following 
antibodies were used: anti-keratin 8/18 (guinea-pig, 1:200, Fitzgerald, 20R- 
CP004), anti-RFP (rabbit, 1:500, Rockland, 600-401379), anti-keratin 14 (chicken, 
1:500, Covance, Sig3476), anti-p63 (mouse, 1:500, Thermo Scientific, ma121871), 
anti-keratin 5 (rabbit, 1:500, Abcam, ab52635), anti-chicken Alexa Fluor 488, anti- 
chicken Alexa Fluor 568, anti-guinea pig Alexa Fluor 488, anti-mouse Alexa Fluor 
488, anti-rabbit Alexa Fluor 647. All Alexa secondary antibodies were obtained 
from Molecular Probes (Invitrogen). Immunohistochemistry experiments were 
performed for keratin 14, keratin 8/18, GFP, and p63 using a Ventana 
DiscoveryXT instrument (Roche Diagnostics) following the Research IHC DAB 
Map XT procedure. Slides were treated with a mild CC1 and incubated with the 
primary antibodies for 1h at 37 °C. After brief washes, biotinylated donkey-anti- 
rabbit and biotinylated anti-guinea-pig, respectively, were applied for 32 min at 
37 °C. For mouse-anti p63 detection, a monoclonal rabbit-anti-mouse antibody 
was applied for 32 min at 37 °C, followed by incubation with a polymer anti-rabbit 
conjugated with horseradish peroxidase (HRP) (ImmPRESS anti-rabbit peroxi- 
dase, Vector Laboratories) for 32 min at 37 °C. Finally, sections were counter- 
stained with haematoxylin II and bluing reagent (4 min). Staining against ERa, 
SMA and keratin 5 was performed by deparaffinization followed by antigen 
retrieval with citrate buffer and quenching with PBS plus 3% H2Ob. Slides were 
then blocked with PBS plus 2.5% normal goat serum and primary antibodies 
incubated overnight at 4 °C in PBS plus 1% BSA plus 0.5% Tween-20. After brief 
washes, secondary antibodies were then incubated in PBS plus 1% BSA for 30 min 
at room temperature. Signals were enhanced using the Vectastain ABC system and 
visualized with 3,3’-diaminobenzidine (DAB; Sigma). Haematoxylin was used as 
counterstain. Anti-PR staining was performed without antigen retrieval. The fol- 
lowing antibodies were used: anti-keratin 8/18 (guinea-pig, 1:500, Fitzgerald, 20R- 
CP004), anti-keratin 14 (rabbit, 1:500, Thermo Scientific, Rb9020), anti-p63 
(mouse, 1:1,000, Thermo Scientific, mal21871), anti-keratin 5 (rabbit, 1:1,000, 
Abcam, ab52635), anti-SMA (rabbit, 1:500, Thermo Scientific, Rb9010), anti- 
ERo (rabbit, 1:1,000, Santa Cruz, sc-542), anti-PR (rabbit, 1:200, Thermo 
Scientific, Rm9102), anti-GFP (rabbit, 1:50, Invitrogen, A11122). Secondary anti- 
bodies were biotinylated anti-rabbit IgG (1:200, Jackson Immunoresearch), bio- 
tinylated anti-guinea pig IgG (1:200; Vector Laboratories) and biotinylated anti- 
mouse IgG (1:200; Abcam). Haematoxylin and eosin staining was performed using 
standard protocols. 

Microscopy image acquisition. For immunofluorescence, images of stained 
sections were captured using a Zeiss Z1 wide-field fluorescent microscope, 
x5/0.13, X10/0.45 or %X20/0.8 (Plan-APOCHROMAT) objectives, an 
AxioCamMRc camera (1,024 X 1,024, pixel size 6.45 um) and an Axiocam506 
camera (2,752 X 2,208, pixel size 4.54 um). Whole-mount fluorescent mammary 
gland and tumours sections were imaged on the Zeiss Axio Scan.Z1 slide scanner 
(ORCA-Flash4.0 camera, 2,048 X 2,048, pixel size 6.5 jm). Representative images 
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were cropped and processed using the ZenBlue software. For immunohistochem- 
istry, stained sections were examined using a Nikon E600 Eclipse brightfield micro- 
scope (X20/0.5 and X40/0.75 objectives) and images captured with a Nikon 
DXM1200 camera (2,592 X 1,944, pixel size 6.7 um) using the IMS acquisition 
software. All images were scaled appropriately. 

Quantification of immunohistochemistry. Five representative images of 8-15 
tumours of each genotype were captured with a Nikon E600 Eclipse brightfield 
microscope (X20/0.5 objective) and positively stained tumour areas, luminal cells, 
and total epithelial cells quantified with Image J (Fiji). Evaluation of tissue sections 
was performed blindly by two independent investigators. 

Quantification of double-positive tumour epithelial area. Ten tumours from 
each model were stained with anti-keratin 14, anti-keratin 5, anti-keratin 8/18, 
anti-rabbit Alexa Fluor 647, anti-chicken Alexa Fluor 568, and anti-guinea-pig 
Alexa Fluor 488. Whole-mount tumour sections were scanned using the Zeiss 
Axio Scan Z1 slide scanner. For computational reasons, the whole images of 
tumours were first tiled and the channels split using the ZenBlue software and 
Matlab. Images were then processed using Ilastik’®, an interactive supervised 
machine learning toolkit, and the subsequent prediction maps were treated with 
batch functions written with the Matlab programing language. For the statistical 
analysis shown in Fig. 4b, a total of 23 regions of tumours were subsequently tiled 
into 14,872 squared images of 1,024 X 1,024 pixels. In a second step, data from a 
set of nine selected tiles were generated by annotating a few regions representative 
of the predefined classes. For each of the three channels the training data were 
processed using the four phenotype classes: “1.background’, ‘2.fluorescent marker 
(Alexa 488, 568 or 647)’, “3.blood cells’, and ‘4.Stroma’. After annotation of the 
different pixels of the different classes by brush stroke, features of the labelled 
pixels and their local neighbourhood are used to train a Random Forest classifier. 
In a third step, the inferred classifier was used to predict all the tiles in a batch 
process. The last step implemented in Matlab determined the masks for all the 
channels and tiles. To perform the analysis on relevant tiles, that is, tiles sufficiently 
covered by tissue, we performed a k-means clustering on a set of statistical features 
extracted from the tiles. This enabled us to suppress tiles that were merely in the 
background or those that contained non-epithelial structures. For each tile, the 
fluorescent marker mask corresponded to pixels having a probability above 50% in 
the class ‘fluorescent marker (Alexa)’. In addition, a non-tissue mask was calcu- 
lated by the mean of morphological filters on the union of background, blood cells 
and stroma classes. The mask for tissue represented the complement image of the 
non-tissue mask. We calculated on each tile the ratio between the areas of pixels in 
the mask of the fluorescent marker and those in the tissue mask. Finally, the 
distribution of double-labelled fluorescent marker within the whole tumour epi- 
thelial area was represented in Fig. 4b. 

Preparation of mammary single-cell suspensions and labelling. Mammary 
glands were dissected and intra-mammary lymph nodes removed. To obtain 
mammary organoids, mammary glands were processed as described prev- 
iously**!. To obtain single mammary epithelial cells, organoids were washed in 
serum-free Leibowitz L15-medium (Gibco) and digested with Hyclone HyQTase 
(Thermo Scientific). Single cells were washed and filtered through a 40-y1m cell 
strainer (BD Falcon) and counted; 10° cells per ml were stained with the following 
antibodies: PE-Cy7-CD45 (Biolegend; clone 30-F11), APC-Scal (Biolegend; 
clone E13-161.7), PerCP-Cy5.5-CD24 (Biolegend; clone M1/69), PE-CD49f 
(BD-Pharmingen), Alexa700-CD24 (Novus Biologicals; clone M1/69) and 
DAPI (2 pg ml‘, Invitrogen). 

Flow cytometry. FACS was carried out with a BD FACSAria III (Becton 
Dickinson) using a 100-j1m nozzle. Cells were gated based on their forward- 
and sideward-scatter. Pulse-width was used to exclude doublets. DAPI-negative/ 
CD45-negative cells were gated for Tomato or GFP. CD24, Scal and CD49f 
subsets were then gated on GFP-positive epithelium. Tomato-positive cells were 
gated using only CD24, CD45 and Scal antigens. The same numbers of living cells 
were recorded in each condition. FACS data were analysed using FlowJo (Tree 
Star). Total cell numbers were determined by enumerating the total epithelial 
content after single-cell isolation and calculating back based on the percentages 
obtained from antibody staining, sorting and FlowJo analysis. Cell numbers were 
subsequently normalized to one animal (3-9 independent FACS experiments 
of 1-5 pooled animals per time point were quantified; + s.em.; P< 0.05, 
Student’s t-test). 

In vitro colony formation assay and quantification. Freshly sorted cells of each 
subpopulation (500 cells) were plated as previously described”’. Seven days later, 
the colonies were fixed with acetone/methanol (1:1), washed, blocked with 2.5% 
normal goat serum and stained with anti-keratin 8/18 (guinea pig, 1:500, 
Fitzgerald, 20R-CP004), anti-keratin 14 (rabbit, 1:500, Thermo Scientific, 
Rb9020), DAPI (2 pgml”', Invitrogen), anti-guinea-pig Alexa Fluor 488 and 
anti-rabbit Alexa Fluor 647 (1:1,000, Invitrogen). Colonies were imaged using 
the Zeiss Z1 wide-field fluorescent microscope (X5/0.13 DIC). Colonies were 
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defined as a cluster of more than five cells. The number of colonies per well was 
determined manually. Colonies containing more than 20% of keratin 8/18/keratin 
14 double-positive cells were defined as ‘double-positive’. 

In vitro mammosphere culture. Mammosphere cultures were performed as 
described previously”. Freshly sorted subsets (luminal CD24"'Scal~ and 
CcD24™Scal*) from adult FVB and uninduced PIK3CA™!™7® females were 
plated at 20,000 cells per ml in 6-well ultra-low attachment plates (Falcon) in 
DMEM/F12 medium (Gibco) supplemented with 5 1g ml" insulin, 0.5 pg ml 
hydrocortisone, 2% B27 (Invitrogen), 20 ng ml~ 1 EGE and bFGE (BD Biosciences) 
and cholera toxin (Sigma), and cultured at 37 °C in 5% COs. After 5 days, mam- 
mospheres were collected and dissociated into single cells using HY Qtase (Gibco) 
and counted. Luminal subsets from FVB control and PIK3CA"17® animals 
were subsequently treated with TAT-Cre (Millipore) (0.5 1M) at a density of 
20,000 cells per ml in 4 ml overnight at 37°C in 5% CO). The next morning, 
the medium was replaced and cells were cultured for 72 h at 37 °C in 5% CO) to 
allow for maximal recombination and expression of PIK3CA™. After 72h, 
PIK3CA"!!°4’8_GEP*!~ cells were sorted and plated into 24-well ultra-low attach- 
ment plates in medium (described earlier) supplemented with 2% Matrigel 
(growth factor reduced; BD, 356230) at a density of 1,000 cells per well. Control 
and PIK3CA'°”® spheres from each subset were enumerated every 7 days, 
at which point spheres were dissociated and re-plated at a density of 1,000 cells 
per well. 

Mammary fat pad transplantation. For limiting dilution transplantation, freshly 
sorted cells from Lgr5-CreER”” control and Lgr5-CreER"’/PIK3CA"'”® animals 
(pure FVB background) and from K8-CreER™ control and K8-CreER‘?/ 
PIK3CA™!°”8 F1-hybrid (FVB/CD-1) littermates were used. Sorted cells were 
resuspended in limiting dilution numbers in PBS plus 2% FCS with 25% 
Matrigel (growth factor reduced; BD, 356230) and injected in 20-1] volumes into 
inguinal glands of 3-week-old FVB females that had been cleared of endogenous 
mammary epithelium”. Cells from control and from PIK3CA"!™”® animals were 
injected in the same animal on opposite sides. After 8 weeks, glands of the reci- 
pients were removed for evaluation. Glands were spread on a glass slide and fixed 
in Carnoys fixative overnight. Whole-mount staining with carmine alum was 
performed as previously described” and scanned with an Epson 1600 Pro scanner. 
An outgrowth was defined as an epithelial structure composed of ducts arising 
from a central point with lobules and/or terminal end buds*. Frequencies of 
mammary-repopulating units between different cell populations were calculated 
and statistically compared using the Extreme Limiting Dilution Analysis (ELDA)” 
online tool (http://bioinf.wehi.edu.au/software/elda/). 

Immunoblotting. Lysates from mammary glands were prepared by lysing cryo- 
homogenized mammary gland powder in RIPA buffer (50 mM Tris-HCl pH 8, 
150 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS) supplemented 
with 1X protease inhibitor cocktail (Complete Mini, Roche), 0.2mM sodium 
orthovanadate, 20 mM sodium fluoride and 1 mM phenylmethylsulfonyl fluoride. 
Lysates (30-80 11g) were subjected to SDS-PAGE, transferred to PVDF mem- 
branes (Immobilon-P, Millipore) and blocked for 1h at room temperature with 
5% milk or BSA in PBS/0.05% Tween 20. Membranes were then incubated over- 
night with primary antibodies (1:200-1:3,000) and exposed to secondary HRP- 
coupled anti-mouse or anti-rabbit antibodies at 1:5,000-10,000 for 1 h at room 
temperature. Results are representative of at least three different experiments. The 
following antibodies were used: anti-AKT pan (Cell Signaling), anti-pAKT 
(Ser473, Cell Signaling), anti-p110« (Cell Signaling), anti-ERK2 (Santa Cruz) 
and anti-keratin pan (SantaCruz). Blot densities were quantified using ImageJ 
and normalized to pan-keratin for epithelial content. 

RNA isolation. For pre-neoplastic gene expression profiling, mammary epithelial 
subsets were FACS sorted as described earlier into extraction buffer and total RNA 
from 250 or 2,000 sorted cells was isolated using an Arcturus PicoPure RNA 
Isolation Kit (Life Technologies). Subsets of pooled mammary glands of 2-3 
oestrus-synchronized animals (confirmed by vaginal smear) per genotype in three 
independent sortings were collected for microarray analysis. RNA from mammary 
subsets of four animals per genotype was collected for qRT-PCR. For tumour gene 
expression profiling and expression of Lgr5 in the mammary gland, total RNA 
from 50 mg cryo-homogenized tumour tissue or nipple and distal area of mam- 
mary glands were extracted using the TRIzol method (Life Technologies) accord- 
ing to the manufacturer’s instructions. Genomic DNA was removed by DNase I 
digestion (Qiagen) and RNA purified using an RNeasy Plus Mini Kit (Qiagen). 
RNA concentration was measured with a Nanodrop 1000 machine and RNA 
quality assessed using an Agilent 2100 bioanalyzer and RNA Pico Chips or 
RNA Nano Chips. 

Microarray. Total RNA from 250 sorted cells (pre-neoplastic Lgr5), 2,000 sorted 
cells (pre-neoplastic K8) or 100 ng from tumour tissue was used as the input for 
synthesis of amplified cDNA with the NuGen Ovation Pico WTA System 
(NuGen). The resulting double-stranded cDNA was fragmented and labelled 


using the Affymetrix GeneChip WT Terminal Labelling kit (Affymetrix). 
Affymetrix Gene Chip Mouse gene 1.0 ST microarrays were hybridized according 
to the GeneChip Whole Transcript (WT) Sense Target Labelling Assay Manual 
(Affymetrix) with a hybridization time of 16h. Scanning was performed with 
Affymetrix GCC Scan Control Software v. 3.0.0.1214 on a GeneChip Scanner 
3000 7 G with autoloader. 

Normalization and analysis of mammary epithelial subset microarray data. 
The arrays for Gene Expression Omnibus accession numbers GSE40875 (ref. 21), 
GSE59870 (Fig. 2a) and GSE65411 (Fig. 2b) data sets were RMA normalized using 
the bioconductor package affy (R3.0.1/Bioconductor 2.13). Heat maps with 
unscaled normalized expression values for selected genes were plotted with the 
heatmap.2 function of the gplots package. Differential gene expression for the 
comparison of mature luminal (LM) or luminal progenitor (LP) versus myoe- 
pithelial (MYO) cells was calculated with limma. The function topTable was 
used to select the top 300 upregulated genes as luminal mature versus myoepithe- 
lial UP (Luminal_Mature UP, MEIER-ABT) and luminal progenitors versus 
myoepithelial UP (Luminal_progenitors UP, MEIER-ABT), and the top 300 
upregulated genes as myoepithelial versus luminal mature UP (Myoepithelial 
UP, MEIER-ABT). Signatures LIM_MAMMARY_LUMINAL_MATURE_DN, 
LIM_MAMMARY_LUMINAL_MATURE_UP, LIM_MAMMARY_LUMINAL_ 
PROGENITOR_DN, LIM_MAMMARY_LUMINAL_PROGENITOR_UP, LIM_ 
MAMMARY_STEM_CELL_DN and LIM_MAMMARY_STEM_CELL_UP were 
downloaded from http://www.broadinstitute.org/gsea/msigdb/search.jsp in gmt 
format, combined with the signatures described earlier, and used in the PGSEA 
and smcPlot functions of the Bioconductor PGSEA package. 

qRT-PCR. RNA was converted into cDNA using SuperScript III Reverse 
Transcriptase (Invitrogen). Quantitative real-time PCR was performed on unam- 
plified cDNA normalized to the number of sorted cells for each subset (6,000- 
10,000 sorted cells). Taqman probes (Life Technologies) and Taqman Universal 
PCR Mastermix (Applied Biosystems) were applied. The following Taqman 
probe identifiers were used: Krtl4 Mm00516879_m1, Krt5 Mm01305291_g1, 
Lgr5 Mm00438890_m1, Vim Mm01333430_m1, Krt8 Mm00835759_m1, Krt18 
Mm01601702_g1, Gata3 Mm00484683_m1, Elf5 Mm00468732_m1, Csn2 
Mm04207885_m1 and Axin2 Mm00443610_m1. Cycling was performed with 
StepOne Plus Real-time PCR Systems (Applied Biosystems). The results are rep- 
resentative of three qRT-PCR experiments of pooled mammary subsets from 
four animals of each genotype. Results of Csn2 (CD24"'Scal-'*) and Krtl4 
(CD24"'Scalt) from K8-CreER??/PIK3CA™™”® cells and of Krt5 and Lgr5 
(CD24"'Scal~) from Lgr5-CreER™?/PIK3CA178 cells are representative of 
two experiments. Statistical data analysis was performed using ACT values. 
Normalization and analysis of tumour mouse microarray data. Mouse 
Affymetrix microarrays (Gene Expression Omnibus accession number GSE59872) 
were background corrected, quantile normalized, and log transcript cluster express- 
ion values were calculated using the rma() function of the Bioconductor package 
“oligo”. Transcript cluster identifiers were mapped to Entrez Gene identifiers using 
the Bioconductor package “mogenelOsttranscriptcluster.db”, and transcript clusters 
associated with none or multiple genes were removed. Where multiple transcript 
clusters were associated with a single gene that with the maximal variance across 
samples was selected; this resulted in a total number of 20,365 transcript clusters with 
unique gene assignments. Principal component analysis was performed on log, 
expression values from which the mean over samples had been subtracted for each 
gene. Hierarchical clustering of samples was performed using (1 — 1) as distance 
metric, with r being the Pearson’s correlation coefficient for log, expression values 
between each pair of samples. The clustering dendrogram was visualized using the 
“dendextend” R package. 

Analysis of human TCGA breast cancer data and comparison to mouse. 
Human breast cancer expression data and corresponding clinical data* 
were obtained from https://tcga-data.nci.nih.gov/docs/publications/brca_2012/, 
corresponding to the 11 November, 2011 data freeze that contains 522 tumour 
samples with clinical annotation. Human gene symbols were mapped to Entrez 
Gene identifiers and genes selected that are one-to-one homologues between 
human and mouse according to Homologene (build 68, downloaded from ftp:// 
ftp.ncbi.nlm.nih.gov/pub/HomoloGene/build68/31) and that were measured 
on both human and mouse experimental platforms; this resulted in 13,969 
genes. To reduce technical differences between human and mouse samples, 
all samples were scaled to a standard deviation of 1, and the within-species 
mean was subtracted from each gene. Heat map and clustering of the combined 
human and mouse expression data were performed using the function aheat- 
map() from the R package “NMF”” on the top 1,000 genes ranked by variance 
across human samples. Mouse samples were compared with human PAM50 
tumour subtypes (Normal-like, Luminal A, Luminal B, HER2-enriched and 
Basal-like) by calculating Pearson’s correlation coefficients for each mouse 
sample against the averages of all human samples within each subtype. 
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Survival analysis. Tumour incidence was determined by palpation. Kaplan-Meier 
plots were generated using the survival calculation tool from Graphpad Prism and 
significance was calculated using the log-rank test. 

Statistical data analysis. The number of mice was calculated by performing power 
analysis using data from small pilot experiments. Values represent the means + 
sem. or +s.d. Depending on the type of experiments, data were tested using 
unpaired Student’s t-test or log-rank test. *P < 0.05 was considered statistically 
significant. The experiments were not randomized. 
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Extended Data Figure 1 | Scheme depicting mouse lines generated for 
lineage-tracing studies. a, Lgr5-CreER™ (ref. 25) or K8-CreER™ (ref. 1) 
animals were crossed to transgenic lox-STOP-lox PIK3CA™'*”® '° and/or 
Tomato-reporter mice, generating Lgr5-CreER™/ Tomato, K8-CreER?/ Tomato, 
Lgr5-CreER™’/PIK3CA''*’®/Tomato and K8-CreER™’/PIK3CA"'”"/Tomato 
animals for lineage-tracing studies. Lgr5-CreER’?/Tomato and 


K8-CreER'?/Tomato animals were used as controls. b, Lgr5-CreER™” (ref. 25) 
and K8-CreER™ (ref. 1) animals were crossed to lox-STOP-lox PIK3CA"1°47® 
(ref. 16) or PIK3CA™” (ref. 18) animals. Lgr5-CreER™ and K8-CreER’* 
animals were used as controls. Tamoxifen injection induces PIK3CAH1O47R 
PIK3CA and/or Tomato expression. 
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Extended Data Figure 2 | Lgr5-CreER™?/Tomato and PIK3CA™'°*7"/ 
Tomato labelling in the mammary nipple area. a, Lgr5 expression in the 
nipple and distal area of Lgr5-CreER™ glands (n = 3 mice). b, Tracing scheme. 
c, d, Representative images of mammary glands after 4 weeks tracing (n = 3 
mice for each genotype). Scale bars, 2 mm, 100 tm (magnifications). e, Repre- 
sentative haematoxylin and eosin staining of an Lgr5-CreER"’/PIK3CA" 17" 
mammary gland with a tumour. Scale bar, 500 jum. LN, lymph node. 

f, Representative images, FACS plots and quantification of 4 days tracing (24h 
after the last tamoxifen injection) (top: immunofluorescence: n = 3 mice; 
FACS: n = 5 technical replicates (each 1-2 pooled mice); bottom: 


immunofluorescence: n = 3 mice; FACS: n = 3 technical replicates (each 1 
mouse)). Scale bars, 100 um; 20 jtm (magnifications). g, Representative FACS 
plots of 4-week tracing. h, Percentage of total Tomato-positive cells in the 
tracing experiments (Lgr5-CreER™~/Tomato: 4 days n = 5, 4 weeks n = 4,8 and 
13 weeks n = 3 technical replicates (each 1-2 pooled mice); Lgr5-CreER™?/ 
PIK3CA™!78/ Tomato: 4 days n = 3, 4 weeks n = 3, 8 weeks n = 6 and 13 
weeks n = 5 technical replicates (each 1-2 pooled mice)). i, Representative 
images of 7 days tracing (left n = 4 mice; right n = 2 mice). Scale bars, 100 um, 
50 um (magnifications). Bar graphs show means + s.e.m.; two-sided unpaired 
Student’s t-test; *P < 0.05; NS, not significant. 
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Extended Data Figure 3 | Gating scheme for FACS experiments. 

a-d, Representative FACS plots of K8-CreER"?/Tomato (a), K8-CreER™”/ 
PIK3CA™1%78/Tomato (b), Lgr5-CreER'’/PIK3CA* (c) and Lgr5-CreER™/ 
PIK3CA™'’8 (d) animals 4 weeks after tamoxifen injection. The gating 
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strategy shown illustrates the elimination of doublets, dead cells (DAPI"), and 
white blood cells (CD45*) and the sorting of Tomato- or GFP-positive 
mammary epithelial subsets (basal CD24"°Scal~, luminal CD24"'Sca17; 
luminal CD24""'Scal*). 
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Extended Data Figure 4 | K8-CreERT2/Tomato and PIK3CA™1”"/ 
Tomato labelling in the mammary gland. a, Scheme depicting timeline 

of tracing experiments. b, Representative images and FACS quantifications of 
K8-CreER™’/Tomato and K8-CreER™’/PIK3CA"'™”"/Tomato mammary 
glands 4 days after tamoxifen (24 h after the last tamoxifen injection) (top: 
immunofluorescence: n = 5 mice; FACS: n = 7 technical replicates (each 1-2 
pooled mice); bottom: immunofluorescence n = 3 mice; FACS: n = 3 technical 
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replicates (each 1 mouse)). Scale bars, 100 um, 20 jm (magnifications). 

c, Representative FACS plots of 4-week Tomato tracing. d, Percentage of total 
Tomato-positive cells in mammary glands (K8-CreER'?/Tomato: 4 days n = 7, 
4 and 8 weeks n = 4, 13 weeks n = 5 technical replicates (each 1-3 pooled 
mice); K8-creERT2/PIK3CA 1 478/Tomato: 4 days n = 3, 4 and 8 weeks n = 4 
and 13 weeks n = 3 technical replicates (each 1-2 pooled mice)). Bar graphs 
show means + s.e.m. *P < 0.05; two-sided unpaired Student’s t-test. 
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Extended Data Figure 5 | Tracing of GFP-positive mammary subsets. 

a, Percentage of GFP-labelled cells in K8-CreER!?/PIK3CA™?8 versus 
K8-CreER™*/PIK3CA™ animals 4 days after tamoxifen (24 h after the last 
tamoxifen injection) (n = 3 technical replicates, 2 mice per genotype). 

b, d, Representative FACS plots and percentages of GFP-positive cells in 
mammary gland subsets and total mammary epithelial cells 4 and 8-11 weeks 
after tamoxifen. c, e, Bar graphs showing total numbers of GFP-positive cells 
and numbers of GFP-positive cells in basal (CD24"°Scal7) and luminal 
(CD24™Scal~; CD24"Sca1*) subsets of Lgr5-CreER™/PIK3CAN1OVR (c) and 


LETTER 


K8-CreER?*/PIK3CAM1O478 (e) mammary epithelial cells. b, c, 4 weeks: 
non-induced control n = 3, control n = 9, PIK3CA™' n = 3, PIK3CA!1°478 
n= 9 sortings with each 1-4 pooled mice; 8-11 weeks: non-induced control 
n = 3, control n = 3, PIK3CA™! n = 3, PIK3CAN 8 yn = 4 sortings with each 
1-4 pooled mice. d, e, 4 weeks: PIK3CA™" and PIK3CA™'**”® n = 3 sortings 
with each 1-5 pooled mice; 8-11 weeks: PIK3CA“T n = 4, PIK3CA™1OV® 

n = 5-6 sortings with each 1-4 pooled mice. Bar graphs show means ~ s.e.m.; 
two-sided unpaired Student’s t-test; *P < 0.05; NS, not significant. 
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Extended Data Figure 6 | Expression of PIK3CA™'°”® induces Akt 
phosphorylation. Immunoblot and quantification of lysates from K8-CreER™ 
control, PIK3CA" and PIK3CA‘'°*”® mammary glands 4 weeks after 
tamoxifen for p110a, pAkt, Akt, pan-keratin and Erk2 (loading control). 
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n = 3 mice per genotype. Protein levels were normalized to pan-keratin for 
normalization of epithelial content. Bar graphs depict fold change over control 
lysate. Bar graph shows means = s.d.; two-sided unpaired Student's t-test; 

*P < 0.006; NS, not significant. 
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Extended Data Figure 7 | Expression of basal- and luminal-lineage genes 
in PIK3CA™'™”® subsets. a, » Expression heat maps of selected luminal 
?/PIK3CA78 versus control; right: 
K8-CreER’?/PIK3CA™™?8 versus control). LM, mature luminal cells; LP, 
luminal progenitors; Myo, myoepithelial; SC, stem-cell enriched. b, c, Expres- 
sion profiles of basal- and luminal-lineage genes in mammary subsets of 
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Lgr5-CreER™/PIK3CA™'’® compared with Lgr5-CreER™ control (b) and 
K8-CreER"’/PIK3CA™'°*’® compared with K8-CreER™ control animals 
(c). The qRT-PCR results are representative of 2-3 experiments of 4 pooled 
animals of each genotype. Bar graphs show means + s.e.m.; two-sided 
unpaired Student’s t-test; *P < 0.05; NS, not significant; N.d., not detected. 
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Extended Data Figure 8 | Luminal PIK3CA™'™”® cells repopulate a 
mammary gland. a, b, Number of outgrowths in cleared-fat pad transplanta- 
tion of GEP-negative Lgr5-CreER™~ control and GFP-positive Lgr5-CreER™/ 
PIK3CA™!78 expressing luminal subsets (CD24"™Scal7) (a) and GFP- 
negative K8-CreER™ control and GFP-positive K8-CreERT2/PIK3CA"1*78. 
expressing luminal subsets (left, CD24""'Scal~; right, cD24"'Sca1*) 

(b). Representative carmine-stained whole mounts (bottom). Scale bars, 500 jim. 
c, Representative immunostained sections. Scale bars, 50 um. a-c, Data from 
three independent experiments. d, Percentage of K14-, K8/18- and double- 
positive (K14/K8/18) colonies derived from Lgr5-CreER™?/PIK3CAT1°’8, 
Lgr5-CreER™ control (left, pooled data from n = 4 independent experiments 
(1-5 pooled mice)), K8-CreER"?/PIK3CA"°4”® and K8-CreER™ control 
subsets (right, pooled data from n = 3 independent experiments (1-5 pooled 
mice)). Total number of quantified colonies is shown. e, Representative 
images of colonies. Arrowheads indicate K8/18- (white), K14- (yellow) and 
double-positive (blue) colonies. Scale bars, 500 lum. f, Number of colonies 
derived from basal and luminal cells from Lgr5- and K8-CreER™/ 
PIK3CA™1”’8 and control mice. Left, pooled data from three independent 


Luminal CD24 high Sca1 negative 


sortings (each 1-5 pooled animals), total n = 8 (control), n = 10 (mutant) 
technical replicates for basal subset, n = 9 (control), n = 5 (mutant) technical 
replicates for luminal CD24™Scal~ subset and n = 8 (control), n = 4 
(mutant) technical replicates for luminal CD24"'Sca1* subset. Right, pooled 
data from two independent sortings (each 1-5 pooled animals), total n = 8 
(control), 1 = 10 (mutant) technical replicates for basal subset, n = 5 
(control), m = 10 (mutant) technical replicates for luminal cD24"Scal~ 
subset and n = 6 (control), n = 9 (mutant) technical replicates for luminal 
CD24"Scai* subset. Five-hundred cells were seeded for each replicate. A 
colony was defined as a cell cluster of >5 cells. Bar graphs show 

means + s.e.m.; two-sided unpaired Student’s t-test; *P < 0.05. g, Bar graphs 
showing number of spheres derived from FVB-control and PIK3CA™°*78. 
expressing luminal (CD24"'Scal~/*) mammary cells over three passages. 
Representative data (three replicates, n = 4 mice per genotype) from two 
independent experiments. Bar graphs show means ~ s.d. *P < 0.02, two-sided 
unpaired Student’s t-test. h, Representative images of spheres derived from 
CD24™Scal~ cells in passage one (P1) and three (P3). Scale bars, 100 um. 
N.d., not determined; NS, not significant. 


©2015 Macmillan Publishers Limited. All rights reserved 


GFP-positive tumor cells 
Lgr5-PIK3C, 


1047R K8-PIK3CAMI047R 


in tumor 
PS for] oo 
Oo Oo Oo 


%- GFP-positive cells 
8 


K8-creERT2/PIK3CA41047R | gr5-creERT2/PIK3CAM1047R 


Lgr5-creERT2/PIK3CAT OR 


@ 30 
oO 


d 


20 


10 


%-tumor ar 


K8/18 


K8-creERT2/PIK3CA 47% 


ON BF OD 


Pilosebaceous 
adenocarcinoma 


Multi-nodular rosette-type 


Carcinosarcoma 


Sr 


Extended Data Figure 9 | PIK3CA™!’®_evoked tumours express basal and 
luminal markers. a, Representative FACS plots of Lgr5-CreER™/ 
PIK3CA™10¥R and K8-CreER™?/PIK3CAM7® tumours (n = 3). b, Percent- 
ages of total GFP-positive cells and GFP-positive basal (CD24"°Scal_) and 
luminal (CD24"'Scal~'*) subsets of Lgr5-CreER™” and K8-CreER?/ 
PIK3CA™™?8 tumours (n = 3). Bar graphs show means + s.e.m. NS, not 
significant; two-sided unpaired Student's t-test c, Immunostaining for basal 
and luminal markers on serial sections of a multi-nodular rosette-type 
adenomyoepithelioma (Lgr5-CreER™/PIK3CA™'’8) and adenomyoepithe- 
lioma (K8-CreER™?/PIK3CA™'™”®), Scale bars, 100 jum, 50 pm 
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(magnifications). d, Quantification of basal- and luminal-lineage markers of 
Lgr5-CreER™ and K8-CreER'*/PIK3CA"°4”8 tumours. Each dot represents 
one tumour (top: K8/18, K14 and SMA n= 15, K5 n= 14, ERn=10, PR 
n= 9, p63 n = 8; bottom: K8/18, K14, SMA and K5 n = 15, ER, PR and p63 
n= 10). All Lgr5-CreER™/PIK3CA™'”8 tumours and 8/10 and 6/10 of 
K8-CreER"™’/PIK3CA™'°*”® tumours show more than 1% of ER- and/or PR- 
positive cells, respectively. Bar graphs show means ~ s.d. e, Representative 
haematoxylin and eosin stainings of tumour phenotypes. Scale bars, 100 tm. 
f, Percentage of benign and malignant mammary tumours. 
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Extended Data Figure 10 | Expression profiling of K8- and Lgr5-CreER™?/ 
PIK3CA™'°*”® mammary tumours. a, Principle component analysis and 
dendogram of a hierarchical clustering of gene expression profiles from 10 
K8- and 10 Lgr5-CreER™’/PIK3CA™'*”® tumours and 2-3 reference 
mammary glands. Each dot indicates one sample. Circles represent 
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breast cancer gene signatures. Lum, luminal. 
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Reactivation of multipotency by oncogenic PIK3CA 
induces breast tumour heterogeneity 
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Breast cancer is the most frequent cancer in women and consists of 
heterogeneous types of tumours that are classified into different 
histological and molecular subtypes’”. PIK3CA and P53 (also 
known as TP53) are the two most frequently mutated genes and 
are associated with different types of human breast cancers’. The 
cellular origin and the mechanisms leading to PIK3CA-induced 
tumour heterogeneity remain unknown. Here we used a genetic 
approach in mice to define the cellular origin of Pik3ca-derived 
tumours and the impact of mutations in this gene on tumour 
heterogeneity. Surprisingly, oncogenic Pik3ca‘'°’® mutant 
expression at physiological levels* in basal cells using keratin 
(K)5-CreER™” mice induced the formation of luminal oestrogen 
receptor (ER)-positive/progesterone receptor (PR)-positive 
tumours, while its expression in luminal cells using K8-CReER™” 
mice gave rise to luminal ER* PR* tumours or basal-like ER” PR™ 
tumours. Concomitant deletion of p53 and expression of 
Pik3ca"'°*”® accelerated tumour development and induced more 
aggressive mammary tumours. Interestingly, expression of 
Pik3ca"'®*”® in unipotent basal cells gave rise to luminal-like cells, 
while its expression in unipotent luminal cells gave rise to basal-like 
cells before progressing into invasive tumours. Transcriptional pro- 
filing of cells that underwent cell fate transition upon Pik3ca‘'°*”® 
expression in unipotent progenitors demonstrated a profound 
oncogene-induced reprogramming of these newly formed cells and 
identified gene signatures characteristic of the different cell fate 
switches that occur upon Pik3ca‘''°*”® expression in basal and lumi- 
nal cells, which correlated with the cell of origin, tumour type and 
different clinical outcomes. Altogether our study identifies the cel- 
lular origin of Pik3ca-induced tumours and reveals that oncogenic 
Pik3ca"°"”® activates a multipotent genetic program in normally 
lineage-restricted populations at the early stage of tumour initia- 
tion, setting the stage for future intratumoural heterogeneity. These 
results have important implications for our understanding of the 
mechanisms controlling tumour heterogeneity and the development 
of new strategies to block PIK3CA breast cancer initiation. 

Breast cancers can be classified into different histological and 
molecular subtypes including luminal (ER* and/or PR*), HER2* 
and basal-like/triple-negative (ER PR HER2_) cancers, which are 
usually associated with different gene expression and mutation pro- 
files, prognosis and response to therapies’. PIK3CA mutations are 
found in about 30% of breast cancers, more frequently in luminal 
tumours, although they are also found in basal-like/triple-negative 
breast cancers**°. Expression of oncogenic Pik3ca‘'!™’® in all mam- 
mary gland lineages using MMTV-Cre mice*'®” or preferentially in 
luminal progenitors using WAP-Cre mice’* induces heterogeneous 
mammary tumours*!*"’. The reason for this tumour heterogeneity 


upon expression of the Pik3ca"!*7® 


is currently unknown. 

To determine whether breast tumour heterogeneity is determined 
by the cancer cell of origin, we developed a genetic strategy allowing 
the expression of the oncogenic Pik3ca mutant at physiological levels 
using Cre-inducible Pik3ca'''°*”® knock-in mice’, specifically in basal 
cells (BCs) using K5-CreER™ or in luminal cells (LCs) using K8- 
CreER™ mice" and followed their fate and tumorigenic potential over 
time. Tamoxifen (TAM) was administered at a dose that does not 
impair long-term mammary gland development and homeostasis, 
and resulted in the specific labelling of about 20% of BCs (Extended 
Data Fig. 1) in 4-5-week-old K5-CreER™/Pik3cat 78 mice (Fig. 1a). 
While it has been suggested that the mammary gland contains bipotent 
basal stem cells'>”*, our data using K5-CreER?” knock-in or K14-rtTA/ 
TetO-Cre mice, despite the labelling of 20-50% of BCs, showed no 
contribution of BCs to the luminal lineage (Extended Data Fig. 1). 
Further lineage-tracing studies that label all BCs or all LCs will be 
required to determine whether the discrepancy between the different 


mutant in the mammary gland 


a c K5-CreER'/YFP e 

g 560 K5-CreER™/Pik3ca/YFP 
EES} IGEEE- = 

E 60 

Lox Lox 8 ‘i 9/004) Bl Adenomyoepithelioma 
= 40 

EE fic § ~ 
Fo 
E of 
F 0 5 10 15 20 25 

Months elapsed 
b dd —ska-CreER™2/YFP f 


a K8-CreER™/Pik3ca/YFP 6% 12% 


2 80 
60 
= 40 


0 5 10 15 20 25 
Months elapsed 


mm Adenomyoepithelioma 


i Mixed adenomyoepithelioma 
+ myoepithelial carcinoma 
@ Metaplastic carcinoma 


© NST carcinoma 


Tumour-free mice (%: 
no 
oO 


Figure 1 | Oncogenic Pik3ca expression in BCs or LCs leads to distinct 
tumour phenotypes. a, b, Genetic strategy to target Pik3cat)”8 in BCs (a) or 
LCs (b). WT, wild type. ¢, d, Tumour-free survival curves in K5-CreER??/ 
Pik3ca‘"!°*78/Rosa26-YEP mice (n = 24 mice) (latency of 12 + 4 months 
(mean = s.d.)) (c) or K8-CreER’*/Pik3ca‘°*”®/Rosa26-YEP mice (n=11 
mice) (latency of 15 + 4 months) (d). No tumours were observed in 

control K5-CreER'*/Rosa26-YEP mice (n = 10 mice) (c) or in K8-CreER'?/ 
Rosa26-YFP mice (1 = 10 mice) (d). e, f; Pie chart showing tumour 
classification in BC-derived (e) and LC-derived (f) tumours. BC-derived 
tumours were all classified as adenomyoepitheliomas (n = 36 tumours). 
LC-derived tumours comprised adenomyoepitheliomas (n = 8 tumours), 
mixed adenomyoepithelioma with myoepithelial carcinoma (n = 6 tumours), 
metaplastic carcinoma (n = 2 tumours) and invasive carcinoma of NST 

(n = 1 tumour) (f). Detailed histological characterization is presented in 
Extended Data Fig. 2. 
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studies arises from the unspecific and simultaneous labelling of BCs 
and LCs. BC-derived mammary tumours arose with a latency of about 
12 + 4 months (mean + standard deviation (s.d.)) and were all lumi- 
nal-like tumour cells that were ER* PR“, surrounded by BCs (Fig. 1c, e 
and Extended Data Fig. 2a-e), classified by pathologists as adeno- 
myoepithelioma”’ in mouse and humans” (Extended Data Fig. 3a-d). 
Principal component analysis (PCA) and gene clustering analysis of 
gene expression profile from fluorescence-activated cell sorting 
(FACS)-isolated tumour cells using the PAM50 gene set showed that 
these BC-derived-tumours clustered together with the luminal B 
breast cancer subtype (Extended Data Figs 3 and 4). 

The same dose of TAM was administered to 4-5-week-old 
K8-CreER™’/Pik3ca‘!”® mice, resulting in the specific labelling of 
about 20-30% of LCs (Fig. 1b and Extended Data Fig. 1). Mammary 
tumours arose with a similar latency (15+4 months) (Fig. 1d). 
Histological and immunofluorescence analysis revealed that these 
tumours were more heterogeneous, more aggressive and more prolif- 
erative than BC-derived tumours. These tumours comprised adeno- 
myoepithelioma, mixed adenomyoepithelioma with myoepithelial 
carcinoma, invasive carcinoma of no special type (NST), as well as 
tumours that show features of metaplastic basal-like breast cancers 
similar to human breast cancers (Fig. 1f and Extended Data 
Figs 2, 3). Principal component and gene expression clustering ana- 
lyses from cells isolated from seven different luminal-derived tumours 
showed that ER* tumours clustered together with luminal human 
breast cancers, NST tumours clustered in between luminal B and 
HER2* tumours, and metaplastic carcinoma clustered with basal-like 
or HER2* cancers depending on the clustering algorithm (Extended 
Data Fig. 3), 1), consistent with the phenotypic heterogeneity of the 
tumours. These results revealed that Pik3ca''!°*”® expression in LCs 
gives rise to distinct types of tumours that are generally more aggress- 
ive compared with BC-derived tumours. The greater tumour hetero- 
geneity found in the LC-derived tumours may arise from the greater 
plasticity of LCs and/or the heterogeneity of the luminal progenitor 
populations initially targeted in the K8-CreER’” mice. 

We then assessed whether concomitant p53 deletion affects the 
phenotype of mammary tumours depending on their cellular origin. 
K5-CreER"?/Pik3ca'!”®/p53"" mice treated with TAM rapidly 
developed skin and other cancers that required terminating the experi- 
ment before they developed mammary tumours (data not shown). 
To circumvent this problem, we used mice heterozygous for p53 
(K5-CreER’*/Pik3cal!10"7®y p53 ’*) and another basal Cre driver 
(K14-rtTA/TetO-Cre/Pik3cat 478) p53 *) that alleviated the increased 
early mortality seen with the K5-CreER'/Pik3ca'"”®/p53™ mice. 
BC-derived p53 heterozygous tumours arose with a latency of 9 + 2 
months and consisted mostly of adenomyoepithelioma luminal-like 
tumours (42-75%), as well as myoepithelial carcinoma (0-16%), NST 
tumours (0-12%) and metaplastic carcinoma (12-42%) (Fig. 2 and 
Extended Data Fig. 5). As previously shown using MMTV-Cre mice", 
Pik3ca‘'!°4’® expression together with p53 deletion in LCs dramat- 
ically accelerates tumour formation, with a latency of 5 + 1 months 
for p53 homozygous and 9 + 3 months for p53 heterozygous mice 
(Fig. 2). In contrast to BCs, LC-derived p53-deficient tumours always 
consisted of aggressive carcinomas consisting mostly of metaplastic 
carcinoma and high-grade myoepithelial carcinoma with character- 
istics of epithelial-to-mesenchymal transition (Fig. 2 and Extended 
Data Fig. 5), as previously reported following Pik3ca'!°*”® expression 
in all mammary gland cells'*’* and found in human basal-like breast 
cancers with activation of the PI3K pathway by somatic PIK3CA 
mutations and gene copy number amplification*”°. Gene expression 
clustering of these tumours using the PAMS50 genes showed that they 
clustered together with human basal-like or HER2* subtypes depend- 
ing on the clustering algorithm (Extended Data Fig. 3k, 1). These data 
demonstrate that concomitant Pik3ca""!°*”® expression and p53 dele- 
tion accelerates tumour development in basal and luminal lineages 
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Figure 2 | Oncogenic Pik3ca expression and p53 deletion in BCs or LCs 
leads more frequently to highly invasive mammary tumours. a-c, Genetic 
strategy to target Pik3ca"''°*’® expression and p53 deletion in BCs (a, b) or 
in LCs (c). d-f, Tumour-free survival curves in K5-CreER!?/Pik3cat0*7®) 
p53"* /Rosa26-YFP (9 + 3 months latency; mean + s.d.) (n = 6 mice) 

(d), K14-rt TA/TetO-Cre/Pik3cat 4” */p53* /Rosa26-YEP (9 + 1 month 
latency) (n = 14 mice) (e), K8-CreER™/Pik3caT 8) p53"/Rosa26-YEP mice 
(5 + 1 month latency) (m = 20 mice) and K8-CreER */Pik3ca OR /p53”* | 
Rosa26-YFP mice (9 + 3 months latency) (n = 17 mice) (f). Control mice did 
not develop tumours (n = 10 mice per condition). g-i, Pie charts depictin: 
the classification of mammary tumours in K5-CreER™*/Pik3cat!*/p53” "| 
Rosa26-YFP (m = 8 tumours) (g), K14-rtTA/TetO-Cre/Pik3cat p53" / 
Rosa26-YFP (n = 19 tumours) (h), K8-CreER™?/Pik3ca™ 04" /p53"/Rosa26- 
YEP (n = 40 tumours) and K8-CreER™/Pik3ca 7" /p53!"* /Rosa26-YEP 

(n = 22 tumours) (i). Detailed histological characterization is presented in 
Extended Data Fig. 5. 
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and that very aggressive metaplastic tumours arise more frequently 
from oncogenic targeting of LCs than from BCs. 

To define further the cellular basis of intratumoural heterogeneity 
found in Pik3ca‘!°*”®- derived tumours, we combined Rosa26-YFP 
lineage tracing and Pik3ca''!°*”® expression specifically in LCs or 
BCs and assessed cell fate change over time. Interestingly, as early as 
5 weeks after Pik3ca'"!*”® expression in LCs, yellow fluorescent pro- 
tein (YFP) was also detected in basal-like cells clustered around LCs 
(Fig. 3a-e and Extended Data Fig. 6), while, as previously described”, 
K8-CreER'?-targeted cells consist of a self-sustained unipotent popu- 
lation of LCs (Extended Data Fig. 6a—e). Clonal analysis of LCs expres- 
sing oncogenic PIK3CA revealed the presence of bipotent clones 
containing adjacent LCs and BCs, which were never observed in 
YFP control LCs (Fig. 3f and Extended Data Fig. 6n-p). The relatively 
small proportion of K8*/K5* BCs compared with K8~/K5* BCs 
suggests that in the initial stage of LC-to-BC transition, these cells 
expressed markers of both lineages before maturing into basal-like 
cells and losing expression of LC markers (Fig. 3g and Extended data 
Fig. 6q-t), which is consistent with the sequential gene expression 
shown by quantitative polymerase chain reaction with reverse tran- 
scription (qRT-PCR) analysis of FACS-isolated BCs and LCs after 
Pik3ca‘'!°*”® expression (Extended Data Fig. 6u, v). The proportion 
of YFP-expressing LCs increased over time, as well as the proportion of 
YFP* BCs (Fig. 3e), suggesting that Pik3ca'"'™*”® confers a competitive 
advantage on luminal targeted cells. To determine functionally 
whether LCs acquired multipotency upon PIK3CA expression, we 
tested the ability of Pik3ca‘!°”’*-expressing LCs and their BC progeny 
to reconstitute the mammary gland upon transplantation into mam- 
mary fat pads. FACS-isolated LCs expressing Pik3ca'"!*”® were able to 
form outgrowths of mammary epithelium containing both BCs 
and LCs (observed in 6 out of 28 transplants), while, as previously 
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Figure 3 | Oncogenic Pik3ca expression induces multipotency in unipotent 
luminal and basal progenitors. a, b, Immunofluorescence of K8/YFP at 1 
week (a) and K5/YEP at 5 weeks (b) after TAM administration to K8-CreER'/ 
Pik3ca™°*78/Rosa26-YFP mice. m, month; w, week. c, d, FACS analysis of 
CD24 and CD29 expression in Lin-YFP* cells 1 week (c) or 8 weeks (d) after 
TAM induction. e, Percentage of YFP* cells within LCs (CD29"°/ CD24*) 
and BCs (CD29""/CD24*) at different time points after TAM administration 
(n = 3, 6, 3, 6, 4 mice for 1 week, 5 weeks, 8 weeks, 4 months and 7 months, 
respectively). f, Immunofluorescence of K8/K5/YFP 8 weeks after clonal 
induction of K8-CreER™’/Pik3ca'"!”8/Rosa26-YFP mice. Arrow points to 
K5/K8/Pik3ca‘''°4’®/YFP* BC newly generated from a LC. g, Percentage of 
YFP* cells expressing K5 and/or K8 at different time points after Pik3ca‘'104”® 
expression in LCs (n = 3 mice per condition). See Methods for more 

details. h, i, Immunofluorescence of K5/K8 of a mammary outgrowth derived 
from LCs (h) or BCs (i) from K8-CreER™?/Pik3ca*”®/Rosa26-YFP mice. 

j, k, Immunofluorescence of K5/YFP 1 week (j) or of K8/YFP 7 months 

(k) after TAM administration to K5-CreER ?/Pik3ca")°*”/Rosa26-YFP mice. 
1, m, FACS analysis of CD24 and CD29 expression in Lin-YFP* cells 1 week 
(1) or 7 months (m) after TAM induction. n, Percentage of YEP™ cells 
within LCs and BCs at different time points after TAM administration 

(n = 3, 3, 5, 4, 3 mice for 1 week, 5 weeks, 8 weeks, 7 months and 12 months, 
respectively). 0, Immunofluorescence for K8/K5/YFP 7 months after 

clonal Pik3ca""!°*”8/YFP expression in BCs. Arrow points to newly formed 
K8°YEP* LC arising from a BC. p, Mean number of colonies per 1,000 
sorted luminal cells in an in vitro colony-forming assay of YFP* LCs derived 
from K5-CreER™”/Pik3caé*”"/Rosa26-YFP mice induced for 12 months or 
wild-type LCs (n = 3 biologically independent experiments per condition). 
Circles, individual data points. Error bars, standard error of the mean (s.e.m.). 
Scale bars, 10 um. 


described'*"*, wild-type LCs derived from K8-CreER??/Rosa26-YFP 
mice were not able to form mammary outgrowths in the same condi- 
tions (in 0 out of 10 transplants) (Fig. 3h). Likewise, transplantation 
of newly formed BCs from K8-CreER'’/Pik3ca'°”® mice also 
generated mammary outgrowths containing BCs and LCs (7/11), as 
efficiently as control BCs (8/10) (Fig. 3i). Altogether these data show 
that oncogenic Pik3ca promotes multilineage differentiation of LCs, 
inducing cellular heterogeneity at the early stage of the tumour 
initiation process. 

In contrast to the early multilineage differentiation observed after 
oncogenic PIK3CA expression in LCs, during the first few months 
after TAM administration to K5-CreER™?/Pik3ca‘"!°*’®/Rosa26-YFP 
mice, BCs remained unipotent. Only around 7 months after oncogene 
expression, newly formed LCs became detectable and progressively 
increased over time (Fig. 3j-o and Extended Data Fig. 7). 
Immunostaining and qRT-PCR showed that these newly formed 
LCs expressed luminal markers at similar levels to wild-type LCs 
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and no longer expressed high levels of basal markers (Extended Data 
Fig. 7k, 1). To determine whether LCs derived from BCs functionally 
correspond to luminal progenitors, we assessed the clonogenic poten- 
tial of these cells using a colony-forming assay that only allows the 
growth of LCs”’. The number of colonies derived from FACS-isolated 
LCs after Pik3ca‘''°*’® expression in BCs was significantly higher 
compared with wild-type LCs, supporting the notion that oncogenic 
PIK3CA promotes the reprogramming of BCs into functional LCs 
(Fig. 3p). Altogether, these data show that Pik3ca‘!!°*’® induces multi- 
potency in otherwise lineage-restricted basal and luminal unipotent 
progenitors, inducing cellular heterogeneity in oncogene-targeted cells 
before progressing into more invasive tumours. 

To define the molecular mechanisms by which Pik3ca‘!!°*’”® pro- 
motes multipotency and tumour heterogeneity, we performed tran- 
scriptional profiling of FACS-isolated basal-like and luminal-like cells 
after Pik3ca‘!™’® induction in LCs (K8-CreER™/Pik3cat*”®) 
Rosa26-YEP mice) (B-K8PIK and L-K8PIK) and in BCs (K5-CreER’?/ 
Pik3ca‘'!°*”®/Rosa26-YFP mice) (B-K5PIK and L-K5PIK) (Fig. 3d, m 
and Supplementary Table 1). Gene expression clustering analysis 
showed that a profound reprogramming occurred in PIK3CA- 
(H1047R)-expressing cells as they underwent transition between basal 
and luminal lineages, becoming molecularly similar to the mammary 
lineages that they were converted into (Fig. 4a). 

To unravel the molecular mechanisms by which oncogenic Pik3ca 
induced changes in cell fate and determine whether these mechanisms 
are conserved or distinct across different cells of origin, we defined 
the gene signature induced by the expression of oncogenic Pik3ca 
in each population (that is, B-K5PIK versus B-K5YFP, L-K5PIK 
versus L-K8YFP, and so on). Only three genes were upregulated in 
all conditions (Serpina3n, Gdpd3 and Zfp949), and most of the genes 
upregulated by Pik3ca‘'!°*”® expression were dependent on both the 
origin of the cell in which Pik3ca‘"'®*”® was initially expressed and the 
cell lineage in which the oncogene was currently expressed (Fig. 4b, 
Supplementary Table 2 and Extended Data Fig. 8). While Pik3ca"''°*”* 
induced the expression of specific genes according to their cellular 
origin and their basal or luminal phenotypes, 51 annotated genes 
were commonly upregulated in L-K5PIK and in B-K8PIK, including 
the long non-coding RNA Neat1 and the transcription factor Runx2, 
which both regulate mammary cell fate’’**, genes regulating 
signal transduction (for example, Sfrp2), cellular metabolism (for 
example, Tktll, Bdh1) and cell adhesion (for example, EphB2, Trio) 
(Fig. 4b, c and Supplementary Table 2), suggesting that common 
and distinct mechanisms induce cell fate changes upon oncogenic 
Pik3ca expression. 

BC-to-LC fate transition induced by oncogenic Pik3ca induced the 
expression of a distinct set of genes (basal-to-luminal multipotent 
signature) (Fig. 4). Some of these genes, such as Ntrk3 (also known 
as TrkC), were already upregulated in BCs after Pik3ca‘'!°*’® express- 
ion (Supplementary Table 2), suggesting that they represent the sig- 
nature of their BC of origin. After BC-to LC fate transition, 
Pik3ca''!°*”® expression induced genes that are specific for the newly 
formed LCs, including Nrtk2 (also known as TrkB), a neurotrophin 
receptor expressed in wild-type BCs and transiently expressed in LCs 
during Pik3ca‘!*”®-induced BC-to-LC fate transition (Fig. 4d and 
Extended Data Fig. 7m). NRTK2 and NRTK3 have previously been 
shown to be expressed in breast cancers and regulate survival of breast 
cancer stem cells in response to chemotherapy”. In addition, a trans- 
location leading to a gene fusion between Etv6 and Nérk3 causes breast 
tumours in both mouse and humans”. Etv6-Ntrk3 expression in the 
mammary gland predominantly induces the same type of bipotent 
mammary tumours” that arise from Pik3ca‘'!*”* expression in 
BCs, supporting a role for the Pik3ca/Ntrk2-Ntrk3 axis in the estab- 
lishment of bipotency during breast tumour initiation. The basal-to- 
luminal upregulated multipotent signature also contained genes 
commonly upregulated in L-K8PIK (Extended Data Fig. 8k), reflecting 
the consequence of PIK3CA expression in LCs. 
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In contrast, the luminal-to-basal multipotent signature was char- 
acteristic of a wounding and proliferation response marked by the 
upregulation of 1/33 (also known as alarmin; a cytokine that has been 
shown to be overexpressed in breast cancers and attenuates NK res- 
ponse against tumour cells’””*), 1124a, Krt16, Itgb6, Itga2, Itga5, Tne, 
Cd109, Plau, Wntl0a, Timp3, Inhba, Nef, Ereg, Ccdn1 and Ccdn2 
(Fig. 4e). As found during BC-to-LC transition, most of the luminal- 
to-basal multipotent signature genes were specific for the newly 
formed BCs (for example, Ereg, Cend1, Wntl0a, 1133); a significant 
fraction of these genes (for example, Krt16, I124a, Ccnd2, Inhba, Tnc) 
were already upregulated in LCs targeted by oncogenic PIK3CA, sug- 
gesting that they represent the signature of the LC of origin 
(Supplementary Tables 2, 3 and Extended Data Fig. 8). 

To define the relevance of these multipotency gene signatures to 
tumour progression, we assessed the expression of these genes in 
Pik3ca‘'°*”*_derived tumours. Some luminal-to-basal multipotent 
signature genes such as [/24a, Krt16 and Plau were only upregulated 
during the initial stage of reprogramming and downregulated there- 
after, while other genes such as Colllal, the epidermal growth factor 
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rector (EGFR) ligand Ereg, Inhba, Wntl0a and Tnc continued to be 
expressed, or even further increased, in basal-like breast cancers aris- 
ing from LCs (Fig. 4e, f). Similarly, Ntrk2 and Ntrk3 were expressed or 
even further upregulated in K5-CreER'*/Pik3ca‘''°*’®- derived lumi- 
nal tumours (Fig. 4f). These data indicate that some of the genes 
associated with cell fate transition during the early steps of tumour 
initiation increase with tumour progression. 

To define the relevance of the Pik3ca‘!°*’®-induced multipotent 
gene signatures in human breast cancers, we assessed whether the 
different multipotent signatures correlated with a particular molecular 
breast cancer subtype” or disease-free survival in a cohort of system- 
ically untreated breast cancer patients’. Interestingly, the luminal-to- 
basal transition gene signature was strongly associated with basal-like 
breast cancers (Fig. 4g). Higher expression levels of this gene signature 
or individual genes such as NGF, INHBA, ITGB6 and WNTI10A were 
associated with poor clinical outcome (Fig. 4h and Extended data 
Fig. 9), consistent with the more aggressive tumour types induced by 
Pik3ca‘'!°”® expression in LCs. In contrast, the BC-to-LC fate sig- 
nature was associated with luminal A and normal-like human breast 
cancers (Fig. 4i). High gene expression levels of this gene signature 
were significantly associated with better prognosis (Fig. 4j), consistent 
with the less aggressive tumours arising from BCs. These data indicate 
that the genetic program associated with Pik3ca"!*”*-induced multi- 
potency correlated with distinct molecular subtypes of human 
breast cancers and their levels of expression correlated with distinct 
clinical outcome. 

Our study shows that the cell of origin controls tumour heterogen- 
eity in Pik3ca‘"°*”®-induced mammary tumours. Pik3ca‘'!°*”® 
expression in LCs gives rise to aggressive basal-like tumours while 
expression in BCs gives rise to less aggressive luminal-like tumours. 
We demonstrate that Pik3ca‘''®7® induced multipotency in unipotent 
progenitors. The promotion of multipotency induced by Pik3ca‘!°*”* 
is regulated by common and cell-lineage-specific molecular mechan- 
isms that are influenced by the cellular origin in which the oncogene is 
initially expressed, setting the stage for future tumour heterogeneity 
and influencing clinical outcome in patients with breast cancers. 


Figure 4 | Molecular characterization of oncogenic Pik3ca-induced 
multipotency. a, Hierarchical gene expression clustering of BCs and LCs with 
or without Pik3ca‘!’® expression. Green and red correspond to high and 
low expressed genes, respectively. The two major branches of the tree are 
supported by bootstrap values of 100. 0, induced for 10-12 months; y, induced 
for 8 weeks. b, Venn diagram of upregulated genes (>1.5 fold) after 
Pik3ca‘"'°*”® expression in BCs and LCs. c-e, RT-PCR analysis of genes 
belonging to the common (c), basal-to-luminal (d), or luminal-to-basal 
multipotency signature (e) in B-K8PIK and L-K5PIK cell population, 8 weeks 
and 10-12 months after Pik3ca""'”® expression, respectively, compared 
with their age-matched controls. Gene expression was normalized to Gapdh 
housekeeping gene (n = 4 biologically independent samples). f, RT-PCR 
analysis of the multipotency signature genes in control cells, in BC-derived 
adenomyoepithelioma and in LC-derived metaplastic tumours. Data were 
normalized to gene expression in age-matched control LCs (L-K8YFPo) (n = 4 
biologically independent samples). g, i, Expression levels of the luminal-to- 
basal (g) or basal-to-luminal (i) multipotency signature in a large set of breast 
cancer patients according to their PAM50 subtype. Lum, luminal. h, j, Disease- 
free survival in untreated patients according to the level of expression of the 
genes of the luminal-to-basal (h) or basal-to-luminal (j) multipotency 
signature. k-n, Summary of the role of the cancer cell of origin in re 
Pik3ca‘"'°*”® induced tumour heterogeneity. k, Expression of Pik3ca 
BCs gives rise to luminal-like tumours, while in LCs Pik3ca'™*”® gives rise 
to more heterogeneous and aggressive tumours. Types of carcinoma are noted 
at the bottom of the panel. Adenomyo, adenomyoepithelioma. 1, Additional 
p53 deletion promotes Pik3ca'"!”*-induced tumour heterogeneity in BCs 
and leads to more aggressive metaplastic carcinoma in LCs. m, n, Model of 
Pik3ca""'°*’®induced multipotency in LCs and BCs. Genes shown are 
upregulated during cell fate change. Genes highlighted in blue belong to the 
common multipotency signature. Error bars, s.e.m. 
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Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
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METHODS 

Mice. Rosa26-YFP mice*! were obtained from the Jackson laboratory. K5-CreER™, 
and K8-CreER'? mice were described previously". K14-rtTA mice” were provided 
by E. Fuchs. TetO-Cre mice** were provided by A. Nagy. Pik3ca‘!!°4”® knock-in 
mice, in which wild-type exon 20 is replaced by H1047R mutant exon 20 upon Cre 
recombination, were described previously. p53 mice* were obtained from the 
National Cancer Institute at Frederick. 

All experimental mice used in this study were female, mixed strains and more 
than 6 weeks old. No statistical methods were used to predetermine sample size. 
For all experiments presented in this study, the sample size was large enough to 
measure the effect size. The experiments were not randomized. The investigators 
were not blinded to allocation during experiments and outcome assessment. Mice 
designated within the tumour cohort were killed when a palpable mass of max- 
imum 1 cm? was detected. Mouse colonies were maintained in a certified animal 
facility in accordance with European guidelines. Ethical protocol was approved by 
the local ethical committee for animal welfare (CEBEA) from the Université Libre 
de Bruxelles (protocols 363 and 527). 

Targeting expression of YFP and/or PIK3CA(H1047R) and deletion of p53. 
Four- to five-week-old K5-CreER!?/Pik3ca'!™7®/Rosa26-YEP, K8-CreER!?/ 
Pik3ca’t!°4”®/Rosa26-YEP, K8-CreER™?/Pik3cat 78 /p53""/Rosa26-YFP, K8- 
CreER™/Pik3cat!78/p53/* /Rosa26-YFP, K5-CreER™?/Pik3ca¥!”®/p53/*/ 
Rosa26-YFP, K5-CreER!?/Rosa26-YFP and K8-CreER!?/Rosa26-YFP mice were 
induced with 15 mg of tamoxifen (TAM) (Sigma; diluted in sunflower seed oil, 
Sigma) by intraperitoneal injection (3 injections of 5mg every 3 days). TAM 
administration induced a transient delay in mammary gland development during 
puberty but there was no long-term effect on mammary gland development and 
homeostasis**. Five-week-old K14-rtT A/TetO-Cre/Pik3ca*”®/p53"”* /Rosa26- 
YFP and K14-rtTA/TetO-Cre/Rosa26-YFP mice were induced by oral administra- 
tion of doxycycline food diet (1 g kg '; BIO-SERV) for 5 days. For clonal analyses, 
4-to-5-week-old + K8-CreER™*/Pik3ca‘!!™7®/Rosa26-YFP and K5-CreER™?/ 
Pik3ca'"'°4”®/Rosa26-YFP mice were respectively induced with 0.05 mg or 2 mg 
TAM by intraperitoneal injection. For induction in adult mice, 8-week-old K8- 
CreER™’/Pik3ca‘'!*7®/Rosa26-YFP mice were induced with 15mg of TAM by 
intraperitoneal injection (3 injections of 5 mg every 3 days). 

Histology and immunostaining on sections. For immunofluorescence, dissected 
inguinal mammary glands or tumour samples were pre-fixed for 2h in 4% para- 
formaldehyde at room temperature. Tissues were washed three times with PBS 
for 5 min and incubated overnight in 30% sucrose in PBS at 4°C. Tissues were 
embedded in OCT and kept at — 80 °C. Sections of 5 um were cut using a HM560 
Microm cryostat (Mikron Instruments). 

For immunofluorescence, tissue sections were incubated in blocking buffer (5% 
horse serum, 1% BSA, 0.2% Triton-X in PBS) for 1h at room temperature. The 
different primary antibodies combinations were incubated overnight at 4°C. 
Sections were then rinsed three times for 5 min in PBS and incubated with cor- 
responding secondary antibodies diluted at 1:400 in blocking buffer for 1 h at room 
temperature. The following primary antibodies were used: anti-GFP (rabbit, 
1:1,000, A11122, Molecular Probes), anti-GFP (chicken, 1:1,000, ab13970, 
Abcam), anti-K8 (rat, 1:1000, Troma-I, Developmental Studies Hybridoma 
Bank, University of Iowa), anti-K14 (rabbit, 1:1,000, PRB-155P-0100, Covance), 
anti-K14 (chicken, 1:1,000, SIG-3476-0100, Covance), anti-K5 (rabbit, 1:1,000, 
PRB-160P-0100, Covance), anti-K19 (rat, 1:500, Troma-III, Developmental 
Studies Hybridoma Bank, University of Iowa), anti-ER (rabbit, 1:300, sc-542, 
Santa Cruz), anti-PR (rabbit, 1:300, sc-7208, Santa Cruz), anti-Her2 (rabbit, 
1:300, 2165, Cell Signaling), anti-Ki67 (rabbit, 1:500, ab15580, abcam), anti-E-cad- 
herin (rat, 1:1,000, 14-3249-82, eBioscience), anti-vimentin (rabbit, 1:400, ab92547, 
Abcam), anti-Nrtk2 (rabbit, 1:500, sc-12, Santa Cruz), anti-p63 (rabbit, 1:100, 
Mab306-05, Santa Cruz), anti-SMA-Cy3 (mouse, 1:500, C6198, Sigma-Aldrich), 
anti-claudin 3 (rabbit, 1:300, 34-1700, Invitrogen). The following secondary anti- 
bodies were used: anti-rabbit, anti-rat, anti-chicken conjugated to AlexaFluor488 
(Molecular Probes), to Rhodamine Red-X or to Cy5 (JacksonImmunoResearch). 
Nuclei were stained with Hoechst solution (1:2,000) and slides were mounted in 
DAKO mounting medium supplemented with 2.5% Dabco (Sigma). 

For paraffin-embedded tissues, dissected mammary glands were pre-fixed over- 
night, 4°C in paraformaldehyde 4%. Tissues were washed three times with PBS. 
Prior to automated paraffin processing, tissues were washed in tap water and kept 
in isopropanol 70%. Five-micrometre sections were made with a Leica RM2245 
microtome. 

Haematoxylin and eosin staining was performed as previously described’*. p63 
staining on tumour paraffin sections were performed on an automated IHC plat- 
form (Ventana Discovery XT). Briefly, paraffin sections were deparaffinized and 
rehydrated. The antigen unmasking procedure was performed for 36 min at 95 °C 
in EDTA (pH 9). Slides were incubated with the anti-p63 (clone 7JUL, 1:100, 
Leica) for 3h, followed by a linker rabbit anti-mouse (clone M1gG51-4, abcam 


1:750) for 16 min. Finally, slides were incubated with the OmniMap HRP-con- 
jugated anti-rabbit antibody (Ventana) for 12min. Standard ABC kit, and 
ImmPACT DAB (Vector Laboratories) were used for the detection of HRP activ- 
ity. Nuclei staining was done with Mayer’s Hematoxylin (Labonord), followed by 
dehydration and mounting with SafeMount (Labonord). 

Whole-mount mammary gland immunofluorescence. For clonal analyses, dis- 
sected inguinal mammary glands were incubated in 2 ml HBSS plus 30 U ml * 
collagenase plus 300 pg ml ' hyaluronidase (Sigma) for 30 min at 37°C under 
agitation. After three washes of 5 min with HBSS, mammary glands were fixed in 
4% paraformaldehyde for 2 h at room temperature, washed three times for 10 min 
in PBS under agitation and incubated in blocking buffer (5% horse serum, 1% BSA, 
0.8% Triton-X in PBS) for 3h at room temperature. The primary antibody com- 
bination, diluted in the blocking buffer, was incubated overnight at room temper- 
ature under agitation. Samples were washed three times for 10 min in PBS/0.2% 
Tween-20 and incubated in secondary antibodies diluted in the blocking buffer for 
5h under agitation. Cell nuclei were stained with Hoechst for 30 min (1:1,000 in 
PBS/0.2% Tween-20). Samples were mounted on slides in DAKO mounting med- 
ium supplemented with 2.5% Dabco (Sigma). 

Staining on human breast cancer sections. Tissue samples were obtained retro- 
spectively from archival formalin-fixed and paraffin-embedded samples in the 
Department of Pathology of the Erasme Hospital. Histopathological diagnoses 
were reviewed and assessed according to the 2012 World Health Organization 
Classification. Sections of 51m were subjected to standard immunochemistry 
(IHC) as previously described** using respectively monoclonal anti-CK8/18 
(1:200; clone 5D3; BioGenex), anti-CK14 (1:100; clone LL002; Leica) and anti- 
P63 (1:200; clone 7JUL; Leica) antibodies. Staining was visualized with streptavi- 
din-biotin-peroxydase complex kit reagents (BioGenex) using diaminobenzidine/ 
H,0, as the chromogenic substrate. Counterstaining with haematoxylin con- 
cluded the processing. Nuclei staining was done with Mayer’s Haematoxylin 
(Labonord), followed by dehydration and mounting with SafeMount (Labornord). 
Microscope image acquisition. Pictures were acquired on an Axio Observer Z1 
Microscope using X10 and X40 Zeiss EC Plan-NEOFLUAR objectives, with an 
AxioCamMR3 camera and using the Axiovision software (Carl Zeiss). Confocal 
images in Fig. 3f, h, i, o and Extended Data Fig. 6f, hI, n, 0, q-t and Extended Data 
Fig. 7i, m, were acquired at room temperature using a Zeiss LSM780 multiphoton 
confocal microscope fitted on an Axiovert M200 inverted microscope equipped 
with C-Apochromat (X40 = 1.2 numerical aperture) water immersion objectives 
(Carl Zeiss). Optical sections of 1,024 X 1,024 pixels, were collected sequentially 
for each fluorochrome. The data sets generated were merged and displayed with 
the ZEN software. 

Mammary gland and tumour cell dissociation. Mammary glands were dissected 
and lymph nodes removed. Tissues were briefly washed in HBSS, and chopped 
with a McIwain tissue chopper. Chopped tissues were placed in HBSS plus 
300U ml * collagenase (Sigma) plus 300pgml * hyaluronidase (Sigma) and 
digested for 2h at 37°C under agitation. Physical dissociation using a P1000 
pipette was done every 15min throughout the enzymatic digestion duration. 
EDTA at a final concentration of 5mM was added for 10 min to the resultant 
organoid suspension, followed by 0.25% Trypsin-EGTA for 2 min (only in the case 
of normal mammary glands) before filtration through a 70 1m mesh, two success- 
ive washes in 2% FBS/PBS and antibody labelling. 

Cell labelling, flow cytometry and sorting. Two-to-five million cells per con- 
dition were incubated in 250 pl 2% FBS/PBS with flurochrome-conjugated prim- 
ary antibodies for 30 min, vortexing every 10 min. Cells were washed with 2% FBS/ 
PBS and were resuspended in 2.5 jig ml” ' 4’,6-diamidino-2-phenylindole (DAPI; 
Invitrogen) before analysis. Primary antibodies used were: PE-Cy7-conjugated 
anti-CD24 (1:50, clone M1/69, BD Biosciences), APC-conjugated anti-CD29 
(1:50, clone eBioHMb1-1, eBiosciences), PE-conjugated anti-CD45 (1:50, clone 
30-F11, eBiosciences), PE-conjugated anti-CD31 (1:50, clone MEC 13.33, BD 
Biosciences), PE-conjugated anti-CD140a (1:50, clone APA5, eBiosciences). 
Data analysis and cell sorting were performed on a FACSAria sorter using the 
FACS DiVa software (BD Biosciences). Dead cells were excluded with DAPI; 
CD45-, CD31- and CD140a-positive cells were excluded (Lin) before analysis 
of the YFP* cells. For profile analysis, a minimum of 1,000 YFP* cells were 
analysed per sample. 

Tumour harvesting and classification. Tumours were detected by mammary 
gland palpation. Mice were killed when one tumour reached a maximum of 1 cm 
diameter. The K5-CreER™’/Pik3ca‘"!”®/Rosa26-YFP mice presented 1 tumour 
in 58%, 2 tumours in 25%, and 3 or more tumours in 17% of the cases at the time of 
analysis (a total of 36 tumours from 24 mice were analysed). The K8-CreER™?/ 
Pik3ca!°478/Rosa26-YFP mice presented 1 tumour in 64%, 2 tumours in 18%, 
and 3 or more tumours in 18% of the cases at the time of analysis (a total of 17 
tumours from 11 mice were analysed). The K5-CreER'/Pik3cat™”*/p53"* / 
Rosa26-YFP mice presented 1 tumour in 83%, and 3 tumours in 17% of the cases 
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at the time of analysis (a total of 8 tumours from 6 mice were analysed). The K14- 
rtTA/TetO-Cre/Pik3ca”* /p53" ’*/Rosa26-YFP mice presented 1 tumour in 
57%, 2 tumours in 36%, and 3 or more tumours in 7% of the cases at the time 
of analysis (a total of 19 tumours from 14 mice were analysed). The K8-CreER'”/ 
Pik3ca'*7®) p53!"/Rosa26-Y FP mice presented 1 tumour in 20%, 2 tumours in 
45%, and 3 or more tumours in 35% of the cases at the time of analysis (a total of 40 
tumours from 20 mice were analysed). The K8-CreER™?/Pik3ca 8/53" / 
Rosa26-YFP mice presented 1 tumour in 71%, and 2 tumours in 29% of the cases 
at the time of analysis (a total of 22 tumours from 17 mice were analysed). For each 
harvested tumour, the tumour was cut in three pieces, one for paraffin embedding, 
one for OCT embedding, and one for cell sorting and RNA extraction. Tumour 
classification was done based on histological features. 

Mammary colony-forming assay. Luminal YFP* cells from K5-CreER‘?/ 
Pik3ca‘!*”®/Rosa26-YFP mice induced for 12 months were flow-sorted as a 
single-cell suspension based on their Lin” CD29'°CD24* YFP* profile. Control 
YFP luminal cells from K5-CreER’?/Rosa26-YFP induced for 12 months were 
sorted based on their CD29"°CD24" profile. Luminal cells were cultured with 
irradiated NIH 3T3 feeder cells in Mouse-Epicult B media (Stem Cell 
Technologies) supplemented with 10ngml’ epidermal growth factor (Sigma- 
Aldrich), 10ngml~’ basic fibroblast growth factor (R&D Systems), 44g ml 
heparin (Sigma-Aldrich), 1mgml~' bovine serum albumin (BSA; Sigma- 
Aldrich), 5% FBS (Life Technologies), 50 units ml penicillin and 50 pg ml! 
streptomycin (Life Technologies), as previously described”. After 1 week, 
colonies were fixed with methanol, stained with Giemsa stain (Sigma-Aldrich) 
and counted manually. 

Mammary fat pad transplantation and analysis. Eight thousand LCs from 
K8-CreER™?/Pik3ca™!°*”8/Rosa26-YFP or control K8-CreER’?/Rosa26-YFP or 
1,350 BCs from K8-CreER!?/Pik3caé!!°*”8/Rosa26-YEP or control K14-rtTA/ 
TetO-Cre/Rosa26-YFP induced for 4 months were sorted based on their 
Lin’ YFP*CD29"°CD24* or Lin YFP‘ CD29"'CD24* profiles. LCs were resus- 
pended in 10 pl DMEM plus 50% bovine serum. BCs were sorted in the presence of 
10 uM of Rock inhibitor (Y27632, Sigma) and resuspended in 75% DMEM/25% 
matrigel. Cell suspension was injected into the fourth mammary gland of 3-to-4- 
week-old NOD-SCID mice that had been cleared of endogeneous epithelium as 
previously described'*””’. Recipient mice were mated 4 weeks after the transplanta- 
tion, and killed 2-to-3 weeks later, when fully pregnant. Recipient glands were 
dissected and stained for GFP, K8 and K5 as whole mounts. An outgrowth was 
defined as an epithelial structure comprising ducts and lobules and/or terminal 
end buds. 

Quantification of keratin* cells within YFP* cells. A total of 1,907, 1,704 and 
2,391 YEP cells from three different mice per condition were analysed respect- 
ively in K8-CreER!?/Rosa26-YFP induced 4 weeks, K8-CreER??/Pik3cas 478) 
Rosa26-YFP induced 1 week and 8 weeks on 5 um cryosections stained for K5, 
K8 and GFP. Coexpression of these markers was analysed with a confocal micro- 
scope. Cells were scored as K8*K5  (K8), K8*K5* (K5K8) or K8 K5* (K5) and 
are shown in Fig. 3g. 

Quantification of clone composition. Mammary glands were processed as whole 
mount and stained for K8, K5 and GFP. Clones were analysed by confocal micro- 
scopy. A total of 822, 936, 714 and 360 clones from three independent mice per 
condition were analysed in K8-CreER'/Pik3ca'!!"”®/Rosa26-YEP induced for 1 
week, induced for 10 weeks, and in K5-CreER'?/Pik3cat!!*’®/Rosa26-YEP 
induced for 1 week and induced for 7 months respectively at dose of TAM that 
labelled very few and isolated cells. The clones were scored in three classes accord- 
ing to their keratin expression: luminal clones, composed only of K8* cells, basal 
clones, composed only of K5* cells, and mixed clones, composed of K8* and K5* 
cells. These data are shown in Extended Data Fig. 6p and Extended Data Fig. 7}. 
Quantification of percentage YFP-labelled cells. The percentage of YFP labelled 
cells within the luminal and basal populations was quantified by FACS. The 
luminal population was defined as the CD29'°CD24* population and the basal 
population was defined as the CD29"'CD24* population. 

Whole-mount carmine staining. Whole-mount mammary fourth mammary 
glands were fixed in methanol Carnoy (60% methanol, 30% acetic acid, 10% 
chloroform) for at least 2h and rehydrated in 70% ethanol, followed by water. 
Staining in carmine alum (Sigma) was done overnight and excess dye was 
rinsed with water. This is followed by incubation in 70%, 95%, 100% ethanol 
(1h each) and fat-clearing in toluene overnight. All steps were carried out at room 
temperature. 

Epithelial outgrowth measurement. Carmine-stained mammary glands were 
photographed with a Leica M80 stereomicroscope equipped with a Leica IC80 
HD digital camera. The distance from the lymph node of the mammary epithe- 
lium was scored by measuring the distance between the distal edge of the lymph 
node and the most distal tip of the epithelium. 
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RNA extraction and quantitative real-time PCR. The protocol used for RNA 
extraction on FACS-isolated cells has been previously described’’. Briefly, RNA 
extraction was performed using the RNeasy micro kit (Qiagen) according to the 
manufacturer's recommendations and DNase treatment. After nanodrop RNA 
quantification and analysis of RNA integrity, purified RNA was used to synthesize 
the first-strand cDNA ina 50 il final volume, using Superscript II (Invitrogen) and 
random hexamers (Roche). Genomic contamination was detected by performing 
the same procedure without reverse transcriptase. Quantitative PCR analyses were 
performed with 1 ng of cDNA as template, using FastStart Essential DNA green 
master (Roche) and a Light Cycler 96 (Roche) for real-time PCR system. 

Relative quantitative RNA was normalized using the housekeeping gene Gapdh. 
Primers were designed using PrimerBank database (http://pga.mgh.harvard.edu/ 
primerbank/) and are listed in Supplementary Table 4. Analysis of the results was 
performed using Light Cycler 96 software (Roche) and relative quantification was 
performed using the AACt method using Gapdh as reference. The entire proced- 
ure was repeated in four biologically independent samples. For Extended Data 
Figs 6 and 7, data are shown as fold change over luminal cells or basal cells derived 
from 3-month-old wild-type mice (L-WT and B-WT). 

Microarray analysis. Total RNA was analysed using mouse whole-genome MG- 
430 PM array from Affymetrix at the IRB Functional Genomics Core. All the 
results were normalized with RMA normalization using R-bioconductor package 
affy with standard parameters**”’. Two biologically independent samples were 
analysed for each condition, except for tumours derived from K5-CreER??/ 
Pik3cat*”8/Rosa26-YEP or K8-CreER??/Pik3cas!°4”8/Rosa26-YFP, for which 
three and seven samples were analysed, respectively. Sorted BCs from K5- 
CreER'?/Rosa26-YFP mice induced for 8 weeks or 10-12 months, LCs from 
K8-CreER"?/Rosa26-YFP mice induced for 8 weeks or 10-12 months, BCs and 
LCs from K5-CreER™?/Pik3ca'!!™7®/Rosa26-YFP mice induced for 10-12 
months, BCs and LCs from K8-CreER!/Pik3ca™!™”®/Rosa26-YEP mice induced 
for 8 weeks, Lin” cells from K5-CreER??/Pik3ca‘!©*7"/Rosa26-YFP-derived 
tumours and from K8-CreER’?/Pik3ca‘!!™”®/Rosa26-YFP-derived tumours 
numbers 1, 2,7, YEP* cells from K8-CreER!?/Pik3ca™!47®/Rosa26-Y FP-derived 
tumours numbers 3, 4, 5, 6 and from K8-CreER™?/Pik3ca 78 /p53" “'Rosa26- 
YFP-derived tumours were analysed. The gene signature induced by the express- 
ion of PIK3CA(H1047R) in each population was determined by comparing their 
transcriptional profile with LCs arising from age-matched K8-CreER™’/Rosa26- 
YFP or BCs arising from K5-CreER™’/Rosa26-YFP mice. Only genes upregulated 
or downregulated by at least 1.5 fold were considered in the analysis. 
Microarray data clustering. Clustering and bootstrap analyses were performed 
using the pvclust and gplots packages of the R statistical suite*®. Clustering was 
performed with the default parameters of the R hclust function (Euclidean dis- 
tance and complete linkage) considering only the top 500 most variant genes 
among all experiments. 

Gene signature comparison. Venn diagrams were computed with the R statistical 
tool. The reported hypergeometric P values for every comparison between two 
signatures correspond to the probability to observe an intersection of at least a given 
size by chance only, knowing the number of genes tested on a microarray chip. 
Murine and human breast tumours gene expression profile comparison. To 
compare the murine tumour gene expression data to human tumour data, we used 
the METABRIC data set composed of 1,992 patients*!. METABRIC expression 
data were downloaded from the EBI website (data sets EGAD00010000210 and 
EGAD00010000211). When multiple probes mapped to the same Entrez gene 
identifier, we kept the one with the highest variance in the data set using the 
genefu package. The PAM50 subtypes were computed using the Bioconductor 
genefu package dedicated function’? (1,448 basal, 1,027 HER2*, 2,260 LumB, 
2,162 LumA and 323 normal). 

Boxplots and Kruskal-Wallis test P values were computed using R. P values 
reflect the probability that at least one of the cancer subtypes express the tested 
signature at a significantly different level. 

Uni-directional Student’s t-test P values reflect the probability that one sig- 
nature is significantly more expressed (or repressed) in one subtype compared 
to all the others. For the t-tests, as they are more robust to the extreme values, 
median and interquartile ranges were chosen as estimators of the central tendency 
and of the dispersion (instead of the mean and the standard deviation). 

We then merged the murine data set with the METABRIC data set by keeping 
those genes described as orthologous in the EnsEMBL database downloaded via 
Biomart” and having exactly the same identifier. Batch effect between murine and 
human data was corrected using the Combat function of the Bioconductor sva 
package*’. The PCA and clustering analyses were performed using the R statistical 
software considering an expression matrix containing only the expression values 
of the 46 PAM50 orthologous genes between mouse and human. For clustering, we 
used the Euclidean distance combined to the complete hierarchical clustering 
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method (default parameters). PAM50 subtypes were computed using the 
R/Bioconductor genefu package. 

Survival analyses in humans. Mouse-derived signatures were converted to 
human signatures by considering the orthologous genes in humans. Signatures 
score were then computed and re-scaled using the dedicated function of the 
R/Bioconductor genefu package. The scores were computed for each patients of 
the METABRIC together with those present in 33 other breast tumours reference 
data sets” (7,220 patients). Survival curves were computed using the dedicated 
function of the genefu package only on untreated patients (1,859 cases) with 
available survival data. Expression level categories correspond to the tertiles of 
the expression values in the untreated patients. P values correspond to the log-rank 
P value, which reflects the probability that at least one of the class of signature 
expression presents a significantly differing outcome from the other classes. 
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Extended Data Figure 1 | Tamoxifen administration has no long-term 
effect on the mammary gland. a, b, Effect of TAM on mammary epithelial 
postnatal growth. a, b, Representative whole-mount preparations of carmine 
alum-stained mammary epithelium from the fourth mammary gland, showing 
that TAM induces a delay in mammary epithelium growth at early time 
points, but no difference is observed 8 weeks after TAM induction (a) and mean 
distance from lymph node distal edge to the distal epithelial edge 1 week, 

5 weeks and 8 weeks after TAM injection or oil injection (b) (n = 6, 6, 4, 3, 5, 
4 mice respectively for 1 week control (ctr), 1 week TAM, 5 weeks control, 

5 weeks TAM, 8 weeks control, 8 weeks TAM). P value derived from two-sided 
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Student’s t-test is 0.161, 0.035, 0.748 when comparing control and TAM 
conditions at 1 week, 5 weeks and 8 weeks, respectively. c, Percentage of YFP* 
cells in LCs (CD29"°/CD24*) and in BCs (CD29"/CD24"*) analysed by 
FACS 48 h after TAM administration in K5-CreER™/Pik3ca"™°*’”"/Rosa26- 
YEP and K8-CreER”?/Pik3caé!™”"/Rosa26-YEP, or 1 week after doxycycline 
administration to K14-rtT A/TetO-Cre/Pik3ca 7" /p53!* /Rosa26-YEP 
mice (n = 5, 6, 3 mice respectively for K5-CreER’’, K8-CreER” and K14- 
rtTA/TetO-Cre). Circles, individual data points. Scale bars, 100 um. 

Error bars, s.e.m. 
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Extended Data Figure 2 | Characterization of tumours derived from basal u-y, Characterization of metaplastic carcinoma. a, f, k, p, u, Haematoxylin and 
or luminal cells upon oncogenic Pik3ca expression. a—e, Characterization eosin staining. b, g, 1, q, v, p63 immunohistochemistry. c, h, m, r, w, Immuno- 


of adenomyoepithelioma (adenomyo) tumours derived from K5-CreER?”/ fluorescence of ER/KS8. d, i, n, s, x, Immunofluorescence of K8/K14. 
Pik3ca¥!°*’8/Rosa26-YFP mice. f-y, Characterization of tumours derivedfrom _ e, j, 0, t, y, Mean percentage of Ki67* cells within tumours (n = 6, 3, 3, 1,3 
K8-CreER??/Pik3catt*”®/Rosa26-YFP mice. f-j, Characterization of tumours and total number of cells counted = 10,408, 10,758, 11,174, 4,622, 


adenomyoepithelioma. k-o, Characterization of myoepithelial carcinoma (C). 5,732 in e, j, 0, t, y, respectively) . Error bars, s.e.m. Scale bars, 10 um. 
p-t, Characterization of invasive carcinoma of no special type (NST C). 
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Histological classification of tumours shown 

K5PIK TU1 adenomyoepithelioma 

K5PIK TU2 adenomyoepithelioma 

K5PIK TU3 adenomyoepithelioma 

K8PIK TU1 myoepithelial carcinoma 

K8PIK TU2 invasive carcinoma NST 

K8PIK TU3 metaplastic carcinoma with mesenchymal differentiation 
K8PIK TU4 adenomyoepithelioma 

K8PIK TU5 adenomyoepithelioma 

K8PIK TU6 adenomyoepithelioma+ myoepithelial carcinoma 

K8PIK TU7 adenomyoepithelioma 

K8PIKp53 TU1 metaplastic carcinoma with mesenchymal differentiation 
K8PIKp53 TU2 metaplastic carcinoma with mesenchymal differentiation 
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Extended Data Figure 3 | Similarities between mouse Pik3ca-derived 
mammary tumours and human breast cancers. a-d, Human breast tumour 
histologically classified as adenomyoepithelioma resembling K5-CreER™”/ 
Pik3caé!*’8/Rosa26-YFP-derived tumours (K5PIK TU). a, Haematoxylin and 
eosin staining. b-d, p63 (b), K8/K18 (c) and K14 (d) immunohistochemistry in 
the human adenomyoepithelioma. e-h, Human breast tumour histologically 
classified as metaplastic carcinoma resembling K8-CreER™”/Pik3ca‘''°*”"/ 
Rosa26-YFP derived tumours (K8PIK TU). e, Haematoxylin and eosin 
staining. f-h, p63 (f), K8/K18 (g) and K14 (h) immunohistochemistry in the 
human metaplastic carcinoma. i-k, Principal component analysis (PCA) of 
the METABRIC patients together with murine tumours according to the 
expression values of the PAM50 genes common to mice and humans. i, PCA of 
three K5-CreER!?/Pik3ca!4”® tumours (black dots) showing that these 


tumours cluster with human luminal B cancer subtype. j, PCA of seven K8- 
CreER'/Pik3ca‘!*7"/Rosa26-YFP-derived tumours (numbered black dots). 
Histological classification of each numbered tumour is described below the 
figure. k, PCA of two K8-CreER™/Pik3ca 1 o4”*/p53!"/Rosa26-Y FP-derived 
tumours (K8PIKp53 TU) (black dots) showing that these tumours cluster 
together with human HER2* subtype. I, Clustering of the murine tumours 
among human tumours of the METABRIC data set. Clustering was performed 
by grouping tumours presenting similar expression patterns of PAM50 
genes. Colours on top of the heatmap represent the PAM50 subtypes attributed 
to the human tumours. The discrepancy between PCA and clustering analysis 
are due to the influence of HER2 low expression in these tumours, for 

which around 60% of PC2 relies on ERBB2 expression. Scale bars, 10 tum. 
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Extended Data Figure 4 | Gating strategy to analyse and isolate tumour 
cells, LCs, and BCs according to their YFP, CD29 and CD24 profile. 

a-e, Dot plot FACS analysis of unicellular suspension of mammary tumour 
cells (in this example from K8-CreER™?/Pik3ca-H1047R/Rosa26-YFP tumour) 
stained for Lin (CD31, CD45, CD140a). Debris were eliminated from all 
events in P1 (a), doublets were discarded in P2 (b), the living cells were gated in 
P3 by DAPI dye exclusion (c), the non-epithelial Lin* cells were discarded 
in P4 (d), and the YFP* cells were gated in P5 (e). f. Gating strategy used for 
FACS analysis and cell sorting, showing the proportion of parent and total 
cells for each gate. Tumour cells were isolated based on their Lin” profile for 
YFP tumours (P4 gate), or were isolated based on their YFP profile (P5 gate) 
for the YFP* tumours, as described in Methods. g-m, Dot plot FACS 


analysis of unicellular suspension of mammary cells (in this example from 
K5-CreER!/Pik3cat4”*/Rosa26-YEP mice 12 months after TAM induction) 
stained for CD24, CD29 and Lin (CD31, CD45, CD140a). Debris were 
eliminated from all events in P1 (g), doublets were discarded in P2 (h), the 
living cells were gated in P3 by DAPI dye exclusion (i), the non-epithelial 
Lin® cells were discarded in P4 (j), and the YEP* cells were gated in P5 

(k). l, m, CD29 and CD24 expression were used to gate the cp29'°CcD247 
population, corresponding to LCs, and to gate the CD29""CD24* population, 
corresponding to BCs, either in YEFP* cells (1) or in Lin’ cells (m). n, Gating 
tree showing the gating strategy used for FACS analysis and sorting, 

showing the proportion of parent and total cells for each gate. 
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Extended Data Figure 5 | Characterization of tumours derived from BCs of metaplastic carcinoma. a, f, k, p, u, Haematoxylin and eosin staining. 
or LCs upon concomitant expression of oncogenic Pik3ca and deletion of b, g, 1, q, v, p63 immunohistochemistry. c, h, m, r, w, Immunofluorescence of 
p53. a-o, Characterization of tumours derived from K14-rtTA/TetO-Cre/ K8/ER. d, i, n, s, x, Immunofluorescence of K8/K14. e, j, 0, t, y, Mean percentage 
Pik3cat 0478) in 53h ”* Rosa26-YFP mice. a-e, Characterization of adenomyo- of Ki67* cells within tumours (n = 4, 3, 3, 4, 6 tumours and total cells 
epithelioma (adenomyo). f-j, Characterization of myoepithelial carcinoma. counted = 11,903, 10,670, 6,992, 14,743, 8,172 in e, j, 0, t, y, respectively). 
k-o, Characterization of metaplastic carcinoma. p-a’, Characterization of z, Immunofluorescence of K8/HER2. a’, Immunofluorescence of E-cadherin/ 
tumours derived from K8-CreER!?/Pik3cat7®/ p53" “f'/Rosa26-YEP mice. vimentin. Error bars, s.e.m. Scale bars, 10 um. 


p-t, Characterization of myoepithelial carcinoma (C). u- a’, Characterization 
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Extended Data Figure 6 | Oncogenic Pik3ca expression induces 
multipotency in unipotent luminal progenitors. a-d, Immunofluorescence 
showing the expression of K8/YFP (a, c) or K5/YFP (b, d) 1 week (a, b) and 7 
months (c, d) after TAM injection in control K8-CreER™*/Rosa26-YFP 
mammary gland. e, Percentage of YEP* cells in LCs (CD29"°/CD24*) and in 
BCs (CD29""/CD24"*) at different time points after TAM administration to 
K8-CreER"*/Rosa26-YFP mice (n = 3 mice per time point) showing that no 
YFP* cells expressing CD29""/CD24* were detected in control K8-CreER™”/ 
Rosa26-YFP mammary glands at any time point. f-h. Immunofluorescence of 
K14/YFP (f), p63/YFP (g), SMA/YFP (h) 8 weeks (f, g) or 10 weeks (h) after 
TAM administration to K8-CreER!?/Pik3ca*”/Rosa26-YFP mice, shows 
that the BCs arising from LCs upon oncogenic Pik3ca targeting expressed these 
classical markers of BCs. i-m, Induction of Pik3ca!!!°%” expression in LCs 
in adult mice. i-], Immunofluorescence showing the expression of K8/YFP 

(i, k) or K5/YFP (j, 1) 1 week (i, j) and 8 weeks (k, 1) after TAM injection in 
K8-CreER™*/Pik3ca‘'”®/Rosa26-YEP mice induced in adulthood. 

m, Percentage of YFP™ cells in LCs (CD29'°/CD24*) and in BCs (CD29""/ 
CD24") at different time points after TAM administration to K8-CreER™/ 
Pik3ca‘"!°4’8/Rosa26-YFP mice induced in adulthood (n = 4 mice per time 
point). n, o, Immunofluorescence of K5/YFP showing the clonal YFP 
expression in a single isolated LC 1 week after TAM injection (n), and 
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8 weeks after TAM injection showing a clone that gave rise to an LC and a BC 
(0) in K8-CreER™’/Pik3ca‘"!°4’®/Rosa26-YFP mammary gland. Arrow in 

n points to the isolated LC, while arrow in o points to the newly arisen BC. 
p; Distribution of clones 1 week or 10 weeks after TAM injection in K8- 
CreER™*/Pik3ca‘!!°*”8/Rosa26-YEP at clonal dose. Clones were scored as 
composed of only luminal cells (luminal clones), composed of only basal cells 
(basal clones) or composed of luminal and basal cells (mixed clones) (n = 4 
mice per time point). See Methods for more details. q-t, Immunofluorescence 
of K5/K8 (q), K5 (1), K8 (s) and K5/K8/YFP (t) shows that in wild-type 
mammary gland, K5 and K8 are not co-expressed (q), while K5/K8 double- 
positive cells are observed in K8-CreER™/Pik3ca‘!°*”*/Rosa26-YFP 
mammary gland 8 weeks after oncogenic Pik3ca expression in LCs (r-t). 
Arrows in r-t point to K5*K8*YEP* cells. u, v, RT-PCR analysis of luminal 
(u) or basal (v) genes in YFP* LCs and BCs sorted from K8-CreER™”/ 
Pik3cat!!°*78/Rosa26-YFP mice induced for 1 week, 4 weeks or 8 weeks, in 
YFP* LCs derived from K8-CreER™/Rosa26-YFP and in YFP" BCs derived 
from K5-CreER’*/Rosa26-YEP mice induced for 8 weeks. Data for luminal 
genes are compared to adult wild-type LCs (u) while data for basal genes are 
compared to adult wild-type BCs (v) (n = 4 biologically independent 
samples per condition). Circles, individual data points. Scale bars, 10 um. 
Error bars, s.e.m. 
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Extended Data Figure 7 | Oncogenic Pik3ca expression induces multi- 
potency in unipotent basal progenitors. a—c, Immunofluorescence showing 
the expression of K5/YFP (a, b) or K8/YFP (c) at 1 week (a) and 7 months 
(b, c) in control K5-CreER™/Rosa26-YFP mammary gland. d, Percentage of 
YFP * cells in LCs (CD29"°/CD24*) and in BCs (CD29""/CD24°) at different 
time points after TAM administration to K5-CreER?”/Rosa26-YFP (n = 5, 4, 4, 
3 mice for 1 week, 8 weeks, 7 months and 12 months, respectively) showing 
that no YFP cells expressing CD29"°/CD24* were detected in control K5- 
CreER™’/Rosa26-YFP mammary glands at any time point. e-h, Immunofluo- 
rescence of K19/YFP (e), ER/YFP (f), PR/YFP (g), claudin 3/YFP (h), 8 months 
after TAM administration to K5-CreER!?/Pik3ca1*”/Rosa26-YFP mice, 
shows that LCs arising from BCs upon oncogenic Pik3ca targeting expressed 
these classical markers of LCs. i, Immunofluorescence of K5/YFP showing the 
YFP expression in a single isolated BC 1 week after TAM injection at a 
clonal dose. Arrow points to the isolated BC. j, Distribution of clones 1 week or 
7 months after TAM injection in K5-CreER'”/Pik3ca"!*”®/Rosa26-YFP at 
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a clonal dose. Clones were scored as composed of only luminal cells (luminal 
clones), composed of only basal cells (basal clones) or composed of luminal and 
basal cells (mixed clones) (n = 3, 4 mice for 1 week and7 months, respectively). 
See Methods for more details. k, 1, RT-PCR analysis of luminal (kk) or basal 
(1) genes in YEP* LCs and BCs sorted from K5-CreER™/Pik3ca™!°*”"/Rosa26- 
YFP mice induced for 10-12 months, in YEP* LCs derived from K8-CreER’?/ 
Rosa26-YFP mice and in YEP* BCs derived from K5-CreER"’/Rosa26-YFP 
mice induced for 10-12 months. Data for luminal genes are compared to 
adult wild-type LCs (k) while data for basal genes are compared to adult 
wild-type BCs (I) (n = 4 biologically independent samples per condition). 

m, Confocal microscopy analysis of immunofluorescence of YFP, Ntrk2 and K8 
of mammary glands 7 months after Pik3ca expression in BCs, showing that 
the newly formed LCs after Pik3ca expression in BCs co-expressed Nrtk2 and 
K8. Arrow points to formed K8* /Nrtk2*/YFP* cell. Circles, individual data 
points. Scale bars, 10 tum. Error bars, s.e.m. 
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Extended Data Figure 8 | Molecular characterization of oncogenic Pik3ca 
induced multipotency. a, b, Venn diagram representing the common and 
distinct upregulated (a) and downregulated (b) genes in BCs and LCs after 
Pik3ca expression in BCs and LCs compared to age-matched control BCs 
and LCs, respectively, with the name of the list of genes and number of genes 
in each section. The list of genes in each Venn section is provided in 
Supplementary Tables 2 and 3. c-l, Venn diagrams representing the common 
genes upregulated (c, e, g, i, k) or downregulated (d, f, h, j, 1) in the newly 
generated LCs or BCs after Pik3ca‘"!’8 expression in unipotent progenitors 
(c, d); in LCs and in BCs after Pik3ca‘°4”® expression in LCs (genes regulated 
following the initial targeting of Pik3ca™!°4”® in LCs, and thus reflecting 
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the LC of origin) (e, f); in LCs and in BCs after Pik3ca*7® expression in BCs 
(genes regulated by Pik3ca‘™'°*”® in BCs, and thus reflecting the BC of origin) 
(g, h); in BCs after Pik3ca‘1047® expression in LCs and in BCs (genes 
regulated by Pik3ca'"!’® expression in BCs, irrespective of cell of origin) 

(i, j); in LCs after Pik3cat0"”® expression in BCs and in LCs (genes regulated 
by Pik3cat0478 expression in LCs, irrespective of cell of origin) (k, I). Diameter 
of the diagram is proportional to the number of genes it contains. The 
reported hypergeometric P values correspond to the probability of observing 
an intersection of this size by chance only, knowing the number of genes 
tested on a microarray chip. 
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Extended Data Figure 9 | Genes of luminal-to-basal multipotency 
signature correlate with patient outcome in untreated breast cancer 
patients. a—d, Disease-free survival in untreated patients according to the level 
of expression (low = blue, intermediate = green or high = red) of the genes 
in the luminal-to-basal multipotency signature, namely NGF (a), INHBA 

(b), ITGB6 (c) and WNT10A (d), showing that genes of luminal-to-basal 
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multipotency signature predict disease-free survival in untreated breast cancer 
patients. Patients expressing high levels of this signature are more prone to 
tumour relapse while those expressing lower levels of this signature show lower 
rates of relapse. The log-rank P values account for the significance of this 
difference. 
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Mitochondria are involved in a variety of cellular functions, includ- 
ing ATP production, amino acid and lipid biogenesis and break- 
down, signalling and apoptosis’ *. Mitochondrial dysfunction has 
been linked to neurodegenerative diseases, cancer and ageing*. 
Although transcriptional mechanisms that regulate mitochondrial 
abundance are known’*, comparatively little is known about how 
mitochondrial function is regulated. Here we identify the metabo- 
lite stearic acid (C18:0) and human transferrin receptor 1 (TFR1; 
also known as TFRC) as mitochondrial regulators. We elucidate a 
signalling pathway whereby C18:0 stearoylates TFR1, thereby 
inhibiting its activation of JNK signalling. This leads to reduced 
ubiquitination of mitofusin via HUWE1, thereby promoting mito- 
chondrial fusion and function. We find that animal cells are poised 
to respond to both increases and decreases in C18:0 levels, with 
increased C18:0 dietary intake boosting mitochondrial fusion 
in vivo. Intriguingly, dietary C18:0 supplementation can counter- 
act the mitochondrial dysfunction caused by genetic defects such as 
loss of the Parkinson’s disease genes Pink or Parkin in Drosophila. 
This work identifies the metabolite C18:0 as a signalling molecule 
regulating mitochondrial function in response to diet. 

To study the function of very long chain fatty acids, we analysed 
Drosophila lacking Elovl6 (refs 6, 7), the enzyme elongating C16 
fatty acids to C18. Sequence analysis identified noa® as fly Elovl6 
(herein referred to as Elovl6). On standard laboratory food, Elovl6é 
loss-of-function animals (1(3)02281~/"; Elovl6—) die as early larvae® 
(Fig. la). We confirmed that Elovi6 mutants have impaired 
C16:0-+C18:0 elongase activity and reduced C18:0 levels (Extended 
Data Fig. 1a, b), and that their lethality is rescued by human ELOVL6 
(Extended Data Fig. 1c, d). Survival to pupation was rescued by sup- 
plementing fly food (containing little lipid), with C18:0 (Fig. 1a), but 
not C18:1 or C20:0 (Extended Data Fig. 1e), confirming that the larval 
lethality is due to C18:0 deficit. 

We serendipitously discovered that removing antifungal agents 
from fly food improved survival of Elovl6é~ mutants (Fig. 1a). Since 
these agents are mitotoxins, this suggested that Elovl6 mutants might 
be hypersensitive to mitochondrial inhibition. Indeed, sub-lethal con- 
centrations of rotenone, a mitochondrial respiratory chain complex I 
inhibitor, killed Elovl6é~” mutants when added to antifungal-free food 
(Fig. 1b), but other drugs did not (Extended Data Fig. 1f). Thus, mito- 
chondrial function is limiting in Elovl6” mutants. Elovl6é~ mutants 
have impaired mitochondrial respiration, rescued by dietary C18:0 
supplementation (Fig. 1c) or by expressing an alternative oxidase, 
AOX?, allowing bypass of complexes III and IV (Fig. 1d). Complex 
IV activity was not impaired in Elovl6” mutants (Extended Data Fig. 
1g), suggesting that Elovlo6” mutants suffer from a complex III defect. 

If the main cause of Elovl6 lethality is reduced mitochondrial 
function, then viability should be rescued by restoring mitochondrial 
functional capacity. Indeed, Elovl6” viability was rescued by expres- 
sing AOX or Spargel (Drosophila PGC1A), driving mitochondrial 
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Figure 1 | Drosophila lacking C18:0 have impaired mitochondrial function. 
a, Elovlé mutant larval lethality rescued by dietary C18:0 (10% in food) or by 
removal of mitotoxic antifungal reagents (n = 4 X 60 animals per vial). b, Elovl6é 
mutants are sensitive to sub-lethal concentrations (100 LM) of rotenone 

(n = 4 X 30 animals per vial). c, d, Elovl6 mutants have impaired respiration 
(c, left), rescued by supplementing food with C18:0 (10%) (¢, right), or by 
expressing Ciona intestinalis alternative oxidase (AOX) (d), allowing bypass of 
complexes III and IV. n = 4 X 6 animals. e, f, Survival to pupation of Elovl6é 
mutants is rescued by ubiquitous expression of Spargel (e) or AOX (f). 7° tests, 
P=0.05.n = 195 (e) or 81 (f). g, Amino- or carboxy-terminus-tagged Drosophila 
Elovl6 localizes to mitochondria, visualized with mitoGFP in S2 cells. DAPI, 4’,6- 
diamidino-2-phenylindole; HA, haemagglutinin. Scale bar, 10 um (n = 4). For 
details, see Supplementary Methods. Error bars show standard deviation (s.d.). 
a-d, **P < 0.01, *P < 0.05, not significant (NS) P > 0.05, two-tailed t-test. 
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a Body wall b Drosophila $2 cells c HeLa Figure 2 | C18:0 is required for mitochondrial 
Control Elovi6é- RNAi: — Control Elovi6- Control _Delipid. serum fusion. a, Elovlé mutants have fragmented 
mitochondria (top), rescued by dietary C18:0 
oa (10% in food). Bottom, fragmentation quantified 
(8 fields from 4 animals). b, Elovl6 knockdown 
in Drosophila cells causes mitochondrial fragmen- 
MitoTracker tation, reversed by supplementing medium with 
aie DAPI 100 uM C18:0 for 120 min. Bottom, quantification. 
+ . : 
RNAi, RNA interference. n = 50. ***P < 0.001, 
Mann-Whitney test. c, C18:0 removal by 
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biogenesis (Fig. le, fand Extended Data Fig. 1h, i). Thus, the organis- 
mal function of C18:0 is less pleiotropic than expected. Interestingly, 
Drosophila Elovl6 localizes to mitochondrial outer membranes (Fig. 1g 
and Extended Data Fig. 1)). 

Lipidomic analysis of purified larval mitochondria (Extended Data 
Fig. 5b) revealed that their membranes have little C18:0 (Extended 
Data Fig. 2a), suggesting that C18:0 does not have a structural role 
in mitochondria. Elovl6é~ mutants also did not have fewer mitochon- 
dria than controls (assayed by porin levels and citrate synthase activity; 
Extended Data Fig. 2b, c). We therefore investigated whether C18:0 
regulates mitochondrial activity. Mitochondria dynamically fuse and 
fission to form tubular structures’®”*. Elovi6~ mutants had hyper- 
fragmented mitochondria, rescued by dietary C18:0 supplementation 
(Fig. 2a and Extended Data Fig. 2d). Elovl6 knockdown in S2 cells 
reproduced this phenotype (Fig. 2b), indicating that it is cell autonom- 
ous. Importantly, we grow S2 cells in serum-free medium, which lacks 
fatty acids normally bound to bovine serum albumin (BSA) in serum. 
Mitochondria of Elovl6-knockdown cells rapidly re-fused upon addi- 
tion of BSA-conjugated C18:0 to the medium for just 120 min (Fig. 2b) 
or 20 min (data not shown). Mitochondria of HeLa cells also fragmen- 
ted when grown in medium with serum that was delipidated by 
organic extraction (Fig. 2c). This was rescued by re-adding BSA-con- 
jugated C18:0 for 2 h (Fig. 2c), but not other fatty acids (Extended Data 
Fig. 2e). One lipid specific to mitochondria is lipoic acid. Knockdown 
of lipoic acid synthase (LIAS) led to reduced lipoic acid levels but not 


mitochondrial fragmentation (Extended Data Fig. 2f), and depletion of 
C18:0 did not affect levels of lipoic acid or lipoylated proteins 
(Extended Data Fig. 2g, h), indicating that the effects of C18:0 are 
independent of lipoic acid. Thus, C18:0 regulates mitochondrial mor- 
phology in fly and human cells. 

Mitochondrial fragmentation is due to either hyperactive fission or 
impaired fusion. Blocking fission with mdivi-1, a DRP1 inhibitor’’, 
induced mitochondrial fusion in control cells, but not in HeLa cells 
cultured without C18:0 (Fig. 2d), indicating impaired mitochondrial 
fusion in this condition. Mitochondria labelled with photoactivatable 
mitochondrially targeted green fluorescent protein (mitoGFP)’’ 
rapidly fused with the rest of the network (stained with MitoTracker 
Red) in control cells, but not in cells cultured without C18:0, dem- 
onstrating impaired fusion (Fig. 2e). 

Mitochondrial fusion is regulated largely by mitofusins'*”’. Epistasis 
experiments indicated that C18:0 acts upstream of mitofusin to regu- 
late mitochondrial morphology: expression of Drosophila mitofusin 
(dMfn; also known as Marf) rescued mitochondrial fragmentation in 
Elovl6-knockdown cells (Fig. 3a), indicating that dMfn acts down- 
stream of C18:0. C18:0 did not induce fusion in the absence of dMfn 
(Fig. 3b), indicating that C18:0 requires dMfn for its action. Likewise, 
expression of dMfn rescued Elovl6~ larval lethality (Fig. 3c). To test 
whether C18:0 can rescue dMfn loss of function, we generated dMfn- 
knockout flies. These flies phenocopy Elovlé” mutants: they have 
fragmented mitochondria, reduced mitochondrial respiration, and 
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die as early stage larvae that do not grow (Extended Data Fig. 3a—d). 
Dietary supplementation with C18:0 had no effect on the growth or 
viability of dMfn knockouts (Fig. 3d). 

We asked whether C18:0 affects Mfn via post-translational modi- 
fications (PTMs). Mfn from Elovl6- mutant larvae, or from HeLa 
cells growing with delipidated serum, migrated differently in SDS- 
polyacrylamide gel electrophoresis (SDS-PAGE) compared with con- 
trol conditions (Extended Data Fig. 3e, f). Immunoprecipitating 
MFN2 from HeLa cells treated with or without C18:0 and probing 
with antibodies detecting various PTMs revealed that MFN2 from cells 
without C18:0 is hyper-ubiquitinated (Fig. 3e, lanes 1, 3, 5 (endogen- 
ous proteins) and Extended Data Fig. 3g (tagged proteins)). Several 
ubiquitin ligases target MFN2 (refs 16-18). Only knockdown of 
HUWEI rescued the mitochondrial fragmentation (Fig. 3f and 
Extended Data Fig. 4a—c) and MFN2 hyper-ubiquitination (Fig. 3e, 
lane 3 versus 4) caused by C18:0 removal, as well as lethality of 
Elovl6é mutant flies (CG8184 in Drosophila; Extended Data Fig. 4d), 
identifying HUWE1 as the C18:0-responsive ubiquitin ligase. As 
expected’’, increased MFN2 ubiquitination caused MFN2 protein 
destabilization (Extended Data Fig. 3h). C18:0 removal did not dra- 
matically drop MFN2 steady-state levels, partly due to compensatory 
increases in MFN2 expression (Extended Data Fig. 3i), suggesting that 
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Figure 3 | C18:0 acts via TFR1, JNK and HUWE1 
to regulate mitofusin. a, b, C18:0 acts upstream of 
dMfn to regulate mitochondrial morphology in 
S2 cells. Mitochondrial fragmentation induced by 
Elovl6 knockdown is reversed by dMfn gain of 
function (a), whereas C18:0 (100 1M, 2h) cannot 
induce mitochondrial fusion in the absence of 
dMfn (b) (1 = 5). dsRNA, double-stranded RNA. 
c, d, C18:0 acts upstream of dMfn to regulate 
Drosophila growth and survival. Ubiquitous 
expression of dMfn rescues lethality of Elovl6 
mutants until pupation (c) ( ¢ test, P= 0.05, 

n = 685), whereas C18:0 supplementation cannot 
rescue growth of dMfn mutant flies (d) (n = 6, 
not significant (NS) P = 0.05). e, f, Ubiquitination 
of endogenous MFN2 (e) and fragmentation of 
mitochondria (f) (n = 15) in response to C18:0 
removal requires the Mfn2 ubiquitin ligase 
HUWEL. IP, immunoprecipitate. g, Pharmaco- 
logical inhibition of JNK signalling (SP600125, 

10 1M) blunts mitochondrial fragmentation 
induced upon C18:0 removal (24h delipidated 
serum). Representative images (left), quantification 
(right). n = 15. h, TFR1 is required for delipidated 
serum to induce mitochondrial fragmentation. 
For representative images, see Extended Data 

Fig. 7d. n = 15. i, Activation of TFR1 with 1 uM 
gambogic acid (GA) leads to mitochondrial 
fragmentation, which is inhibited by 1 h C18:0 pre- 
treatment. For representative images, see Extended 
Data Fig. 9b. n = 15. j, JNK signalling is required 
to induce mitochondrial fragmentation in response 
to TFR1 activation with gambogic acid. HeLa 
cells treated with 10 1M SP600125 before gambogic 
acid treatment (2h, 1 1M). Inh, inhibitor. For 
representative images, see Extended Data Fig. 9e. 
n=5.k, Schematic diagram of the signalling route 
by which C18:0 regulates mitochondrial fusion. All 
scale bars, 10 pm. Error bars show s.d. *P < 0.05, 
**P < 0.01, ***P < 0.001, ****P < 0.0001, not 
significant (NS) P = 0.05, two-tailed t-test. For 
details, see Supplementary Methods. 
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ubiquitination additionally blocks MFN2 function in a degradation- 
independent manner, as for other HUWE] targets*®. HUWE1 only 
ubiquitinates MFN2 phosphorylated on Ser 27 by JNK”. Inhibition of 
JNK prevented mitochondrial fragmentation upon C18:0 removal 
(Fig. 3g). In sum, C18:0 regulates MFN2 ubiquitination via HUWEI, 
and thereby mitochondrial morphology and function. Elovl6 mutant 
flies display other dMfn loss-of-function phenotypes, such as reduced 
endoplasmic-reticulum-mitochondrial connections and abnormal 
cristae**”* (Extended Data Fig. 5). 

We asked how C18:0 affects JNK or HUWE] activity. Endoplasmic 
reticulum stress can activate JNK. However, C18:0 removal did not 
lead to an unfolded protein response (UPR) (Extended Data Fig. 6a, b) 
and neither knockdown of UPR effectors nor treatment with taurour- 
sodeoxycholic acid (TUDCA), an endoplasmic reticulum chaperone 
that inhibits endoplasmic reticulum stress”, blunted mitochondrial 
fragmentation upon C18:0 removal (Extended Data Fig. 6c, d). 
Instead, we hypothesized that C18:0 might regulate proteins via cova- 
lent binding (‘stearoylation’), analogous to protein palmitoylation. We 
synthesized C18:0 derivatives with azide or alkyne functionalities, 
allowing covalent coupling to beads via copper-catalysed azide-alkyne 
cycloaddition (‘click chemistry’) (Extended Data Fig. 7b). We tested 
multiple derivatives, and only C17:0-azide induced mitochondrial 
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fusion like C18:0 (Extended Data Fig. 7a). We treated HeLa cells with 
C17:0-azide for 2h, lysed them in 8M urea to denature proteins, 
precipitated the lipid by coupling to beads, and identified covalently 
bound proteins by mass spectrometry. The most abundant protein in 
the lipid pulldown was TFR1 (Extended Data Fig. 7b, right), confirmed 
by immunoblotting (Extended Data Fig. 7c, lanes 1-3). Binding of 
TFRI1 to C17:0-azide was abolished by treating lysates with hydroxy- 
lamine pH 7.5, indicating a thioester linkage (data not shown). We 
tested the top ten putatively stearoylated protein complexes for effects 
on mitochondrial morphology by knockdown. Knockdown of TFR1 
completely blunted mitochondrial fragmentation upon C18:0 removal 
(Fig. 3h and Extended Data Fig. 7b, right, d). Thus, TFR1 is required 
for cells to sense the absence of C18:0. 

Since TFRI is important for cellular iron uptake, TFR1 stearoylation 
could affect mitochondria via iron uptake or delivery. However, cells 
grow for days in medium lacking C18:0 but die in medium lacking iron 
(Extended Data Fig. 8a), suggesting that iron uptake is not markedly 
impaired in the absence of C18:0. Indeed, cells in medium lacking 
C18:0 do not show an iron deficiency transcriptional response 
(Extended Data Fig. 8b), a drop in protein or activity levels of enzymes 
containing iron-sulfur clusters (Extended Data Fig. 8c—f), impaired 
transferrin uptake (Extended Data Fig. 8g), or reduced association of 
transferrin with mitochondria (Extended Data Fig. 8h), suggesting that 
the effects of C18:0 are independent of iron. TFR1 also has a signalling 
function, activating JNK in response to the ligand gambogic acid™. 
Low concentrations of gambogic acid that do not induce apoptosis 
(Extended Data Fig. 9a) induced rapid mitochondrial fragmentation in 
HeLa cells (2h; Fig. 3i). This was suppressed by adding C18:0 (Fig. 3i 
and Extended Data Fig. 9b), indicating that C18:0 blocks this signalling 
function of TFR1. Indeed, treatment of HeLa cells with C18:0 reduced 
JNK activation, using JNK phosphorylation and phospho-JNK nuclear 
translocation as readouts (Extended Data Fig. 9c, d). JNK inhibition 
blocked the ability of gambogic acid to induce mitochondrial frag- 
mentation (Fig. 3) and Extended Data Fig. 9e). In sum, these data 
suggest that TFR1 induces mitochondrial fragmentation via JNK, 
and that this is inhibited by TFR1 stearoylation (Fig. 3k). 
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Figure 4 | Mitochondrial morphology is 
sensitive to dietary C18:0 levels in Drosophila. 
a, Dietary supplementation with 10% C18:0 leads 
to increased mitochondrial fusion in control flies 
(left), quantified as a drop in mitochondrial 
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Palmitoyl-transferases covalently bind C16:0 before transferring 
it to substrates. We found one member of this family, ZDHHC6, in 
our C17:0-azide pulldowns, suggesting that it is a C18:0 transferase. 
Indeed, knockdown of ZDHHC6 blunted TFRI1  stearoylation 
(Extended Data Fig. 7c, lane 5). Further work is required to study this 
in detail. 

We noticed that elevating C18:0 levels in control cells increases 
mitochondrial fusion (Fig. 2b). Supplementing the diet of wild-type 
flies with C18:0 also increased mitochondrial fusion, whereas star- 
vation of larvae led to mitochondrial fragmentation (Fig. 4). Thus, 
fly cells respond to both increases and decreases in levels of C18:0. 

We asked whether dietary C18:0 supplementation could improve 
mitochondrial function in pathological conditions. Flies mutant for 
Pink or Parkin are established Parkinson’s disease models. They have 
impaired mitochondrial function, and recapitulate Parkinson’s disease 
phenotypes (reduced lifespan, neurodegeneration and impaired motor 
control’**’). Dietary supplementation with C18:0 rescued the longev- 
ity, ATP levels and climbing defects of Pink flies and the longevity of 
Parkin flies (Fig. 4 and Extended Data Fig. 10; other Parkin pheno- 
types not tested). 

We identify C18:0 as a regulator of mitochondrial function. Upon 
loss of C18:0, TFR1 de-stearoylation activates JNK, leading to 
HUWE!1-dependent MEN2 ubiquitination, impaired MFN2 activity”, 
and mitochondrial fragmentation. Loss of C18:0 in flies specifically 
impacts mitochondrial function, since Elovl6” lethality can be rescued 
by Spargel, AOX or dMfn expression or Huwel knockdown. To our 
knowledge, this is the first time stearoylation of a human protein has 
been found to regulate its function. The link between TFR1 and mito- 
chondria perhaps makes sense, because iron enters cells via TFR1 and 
then mainly travels to mitochondria for iron-sulfur clusters. Flies 
are sensitive to dietary C18:0; increased dietary C18:0 leads to 
increased mitochondrial fusion in vivo. Thus, the metabolite C18:0 
acts as a signalling molecule linking diet to mitochondrial function. 
Intriguingly, dietary C18:0 can also improve mitochondrial function 
in some pathological conditions in the fly, since dietary supple- 
mentation with C18:0 improved the Parkinson’s-disease-related 
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phenotypes observed in Pink and Parkin mutant flies (see Supple- 
mentary Discussion). 
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Extended Data Figure 1 | noa/Elovi6 is the functional homologue of human 
ELOVL6. a, C16:0 to C18:0 elongase activity is significantly blunted in Elovlé 
mutants, whereas elongase activities on other fatty acids measured are not 
affected. Microsomal preparations from control or Elovlé mutant animals were 
incubated with radioactive malonyl-CoA and the indicated fatty-acyl-CoA. 
Elongation was quantified by incorporation of the aqueous metabolite 
malonyl-CoA into lipid-soluble fatty acids, as described previously*’. Values 
represent biological triplicates. b, Gas chromatography flame ionization 
detector (GC-FID) analysis reveals that Elovl6 mutant larvae have reduced 
levels of C18:0, the product of Elovl6 elongase activity. Values are the averages 
of technical duplicates on biological duplicates. Error bars represent standard 
error of the mean (s.e.m.). ¢, Lethality of Elovl6 mutants is fully rescued to 
expected Mendelian ratios by ubiquitous expression (with actin-GAL4) of 
Elovl6 from a UAS transgene (7 test, 1.149 < 3.841 = t, where P = 0.05, 
n= 141; ****P < 0.0001). d, Human ELOVL6 and Drosophila Elovl6 are 
functionally equivalent, since the lethality of Elovlé mutant flies is fully rescued 
to expected Mendelian ratios by ubiquitous expression (with actin-GAL4) of 
human ELOVL6 from a UAS transgene (7 test, 2.38 < 3.841 = %, where 
P=0.05, n = 76; **P < 0.01). e, The lethality of Elovlé mutant flies is most 
strongly rescued by C18:0, the product of Elovl6. Synchronized 1st instar larvae 
of indicated genotypes were grown on standard food supplemented with 
indicated fatty acids (5%). The percentage of total animals surviving to 
pupation was calculated. Values represent average of biological triplicates. 

f, Elovl6é mutants are not hypersensitivite to drugs such as G418 (protein 
biosynthesis inhibitor) or etoposide (topoisomerase inhibitor). Thirty 
synchronized L1 larvae were grown in vials with food supplemented with either 
G418 (50 ug ml") or etoposite (25 1M). Percentage of animals that reach 


pupation was quantified. Values represent average of four biological replicates. 
g, Complex IV activity of Elovl6” larvae is not impaired. Complex IV activity of 
female pre-wandering larvae was measured with Oroboros high-resolution 
respirometry. Oxygen consumption was measured in the presence of only 
N,N,N',N'-tetramethyl-p-phenylenediamine dihydrochloride (TMPD) as 
substrate, which can be directly oxidized by complex IV. The values were 
corrected for non-mitochondrial oxygen consumption (oxygen consumption 
in the presence of complex IV inhibitor potassium cyanide (KCN)) and 
normalized to tissue weight. n = 3. h, i, Overexpression of Spargel in Elovl6 
mutant female pre-wandering larvae leads to increased mitochondrial 
abundance, assessed by porin levels (h; representative of six biological 
replicates) and citrate synthase activity (i; n = 4) in pre-wandering larvae. 
See Supplementary Fig. 12 for image of the uncropped full western blot. 

j, Drosophila Elovl6 (either N- or C-terminally tagged) localizes to the 
mitochondrial outer membrane. S2 cell lysates (‘total’) were successively 
fractionated to yield crude mitochondria (which include mitochondrial- 
associated membranes (MAMs)), pure mitochondria (lacking MAMs), and 
mitochondrial outer membranes (OM), inner membranes (IM) and inter- 
membrane space (IMS). Endogenous porin and ATPsyn-« were used as 
positive controls for outer membranes and inner membranes, respectively. 
7.5 lg of protein from each fraction was loaded per lane. See Supplementary 
Fig. 13 for image of the uncropped full western blot. Representative of two 
biological replicates. k, Lipidomic analysis of standard fly food reveals 

low levels of C18:0 in the food. a, c, d, e, f, g, i, Error bars represent s.d. 

a, b, e-g, i, *P < 0.05, **P < 0.01, not significant (NS) P = 0.05, 

two-tailed f-test. 
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Extended Data Figure 2 | C18:0 regulates mitochondrial morphology. 

a, Lipidomic (GC-FID) profiles of purified mitochondria from Elovlé mutant 
3rd instar larvae do not show major differences compared with control animals. 
Mitochondrial membranes from both control and mutant animals have 

very low levels of C18:0. Controls for purity of mitochondrial prep are shown in 
Extended Data Fig. 6b. b, c, Elovl6 mutant larvae do not have reduced amounts 
of mitochondria, quantified via levels of porin (b; representative of three 
biological replicates) or citrate synthase activity (¢; n = 3). See Supplementary 
Fig. 14 for image of the uncropped full western blot. d, Elovlé mutant 

larvae have fragmented mitochondria, which is rescued by dietary C18:0 
supplementation. Mitochondrial morphology from fat bodies of control or 
Elovil6é mutant female larvae, fed control or C18:0 (10%) supplemented food, 
visualized with mitoGFP. Images are representative of eight areas of four larvae 
from each genotype and food conditions. Equivalent pictures for body wall 
are shown in Fig. 2a. e, Only C18:0, and not shorter, longer or desaturated fatty 
acids, restores mitochondrial fragmentation to control levels in HeLa cells 
grown in medium containing delipidated serum. Mitochondria were visualized 
with MitoTracker (red) (top) and mitochondrial fragmentation was 
quantified by normalizing the number of mitochondrial particles to total 


mitochondrial area (bottom) (m = 15). f, Reduced lipoic acid (LA) levels do not 
lead to mitochondrial fragmentation. Lipoic acid synthase (LIAS) was 
knocked down by RNAi in HeLa cells, leading to significantly reduced lipoic 
acid levels, assayed by immunoblotting of total cell lysates with antibody 
detecting lipoic acid (bottom left). Unlike removal of C18:0, this does not lead 
to mitochondrial fragmentation. Representative images (top left) are quantified 
(top right) (n = 6). See Supplementary Fig. 14 for image of the uncropped 
full western blot. g, h, HeLa cells growing in medium containing delipidated 
serum do not display reduced levels of protein lipoylation (g) or reduced levels 
of lipoylated proteins (h). HeLa cells were grown in medium containing 
delipidated serum for either 24h (the same time point used for all other 
experiments in which mitochondrial fragmentation was assessed) (g, h), or 
for an extended period of time: 4 days (h). Lipoic acid levels were assayed by 
immunobloting total cell lysates with an anti-lipoic acid antibody (g), and 
levels of lipoylated proteins were assessed with specific antibodies (h). See 
Supplementary Fig. 15 for image of the uncropped full western blot. 

c,e, f, *P << 0.05, **P < 0.01, not significant (NS) P = 0.05, two-tailed t-test. 
Error bars represent s.d. Scale bars, 10 um. 
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Extended Data Figure 3 | Mitofusin loss of function phenocopies Elovl6 
mutation or removal of C18:0. a, dMfn knockout larvae (1st instar) have 
fragmented mitochondria, visualized with mitoGFP. Representative of ten 
images. b, Endogenous dMfn runs as a main band plus a laddering of 
apparently increasing molecular weights on an SDS-PAGE gel. Specificity is 
controlled by blotting lysates from control and Mfn-knockout female larvae 
with anti-Mfn antibody. c, Homozygous mutation of dMfn is lethal. Mfn- 
knockout larvae survive for several days as small L1/L2 larvae and eventually 
die. Synchronized 1st instar larvae were grown on standard fly food and 
examined every 24h for developmental stage and per cent survival (n = 30). 
d, dMfn-knockout animals have impaired oxygen consumption. Oxygen 
consumption of inverted, digitonin permeabilized, female larval tissues was 
measured with an Oroboros oxygraph chamber and normalized to tissue 
weight. Oxygen consumption was measured in the presence of the following 
substrates: GMN (glutamate and malate), GMD (glutamate, malate and ADP), 
GMcD (glutamate, malate, cytochrome c and ADP), GMScD (glutamate, 
malate, succinate, cytochrome c and ADP), ETS (glutamate, malate, 
cytochrome c, ADP and uncoupling reagent), and Sc(Rot)u (glutamate, malate, 
cytochrome c, ADP and rotenone). n = 5. e, Endogenous dMfn is post- 
translationally modified in a C18:0-dependent manner in Drosophila. dMfn 
from Elovl6 female mutants migrates in an SDS-PAGE gel differently, 
compared with Mfn2 from control animals. This is reversed by supplementing 
the diet with C18:0. All indicated bands are dMfn, since they disappear in 
lysates from dMfn-knockout animals (see Extended Data Fig. 3b). Flies were 
grown on antifungal-free food. f, Endogenous MEN2 is post-translationally 


modified in a C18:0-dependent manner in human HeLa cells. MFN2 
immunoprecipitated from HeLa cells treated for 24 h with medium containing 
standard or delipidated serum, and then for 2h in the absence or presence 

of C18:0 (100 11M), lysed in 8M urea (see Methods). g, C18:0 affects 
ubiquitination of MFN2. MFN2 is more heavily ubiquitinated in cells treated 
with delipidated serum than in control cells and this is reversed by 
supplementing the medium with C18. HeLa cells were cotransfected with 
tagged versions of MFN2 (myc) and ubiquitin (HA). Tagged MFN2 was 
immunoprecipitated and blots were probed with HA antibody to detect 
ubiquitination. Quantification of ubiquitination, normalized to Myc-MEN2 in 
the immunoprecipitate (IP) is shown below the lane. h, C18:0 removal 
destabilizes MFN2 protein. A cyclohexamide (CHX) chase experiment was 
performed to block de novo synthesis of MFN2, thereby looking at turnover of 
existing MFN2 protein in vivo. HeLa cells treated with medium containing 
delipidated serum plus or minus C18:0 were treated with 100 14M CHX and 
then lysed at the indicated time points to compare MFN2 protein levels. 
Bottom, densitometric quantification of the blots normalized to loading 
control. i, dMfn expression is upregulated in Elovl6 flies compared with 
controls. dMfn transcript levels in 24h female pre-wandering larvae were 
determined by quantitative polymerase chain reaction with reverse 
transcription (RT-PCR), normalized to rp49 (in triplicates). Scale bar, 10 um. 
d, i, *P < 0.05, **P < 0.01, ***P < 0.001, not significant (NS) P= 0.05, 
two-tailed t-test. Error bars represent s.d. See Supplementary Fig. 16 for 
images of the uncropped full western blots. 
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Extended Data Figure 4 | HUWE1 is required for hyperubiquitination of western blot. d, Survival to pupation of Elovlé mutants is fully rescued by 
MEN2 in response to C18:0 withdrawal. a, Short interfering RNA (siRNA) ubiquitous expression (daughterless-GAL4) of RNAi targeting Huwe1 
depletion of other ubiquitin ligases targeting MFN (besides HUWEI, shown (CG8184). Elovl6 mutants expressing HUWEI1 RNAi survive to pupation at 
in Fig. 3) does not rescue the mitochondrial fragmentation induced by removal _ expected Mendelian frequencies (y’ test, 0.86 < 3.841 = 7”, where P = 0.05). 


of C18:0 (top). Bottom, quantification. n = 15. b, siRNA depletion of Flies were grown on antifungal-free food. Values represent average of four 
PARK2 in HEK293 cells, as in HeLa cells (a), does not rescue the mitochondrial _ biological replicates. Scale bars, 10 um. a, b, d, *P < 0.05, **P < 0.01, 
fragmentation induced by removal of C18:0 (top). Bottom, quantification. ***P < 0.001, not significant (NS) P = 0.05, two-tailed t-test. Error 

n= 15. ¢c, HUWE1 knockdown efficiency controlled by detecting HUWE1 bars represent s.d. 


protein levels. See Supplementary Fig. 17 for image of the uncropped full 
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Extended Data Figure 5 | Elovl6é mutants have other Mfn loss-of-function 
phenotypes such as reduced mitochondrial-associated membranes and 
abnormal cristae. a, The mitochondrial associated membranes (MAM) band 
is strongly reduced or absent in Percoll gradients of crude mitochondrial 
fractions from Elovl6-mutant animals, compared with controls. b, Purity 
control of mitochondrial preparations show that pure mitochondrial fractions 
are lacking markers of other subcellular organelles such as calnexin 
(endoplasmic reticulum) and lamin (nuclei). See Supplementary Fig. 17 for 
image of the uncropped full western blot. Right, quantification shows that levels 
of the endoplasmic reticulum marker calnexin are reduced in crude 


mitochondrial fractions from Elovlé mutants, compared with controls, in 
agreement with reduced MAMs in Elovlé mutants. Values show densitometry 
ratios of calnexin levels in crude mitochondrial fractions, normalized to 

total lysate calnexin. c, Electron microscopy of Drosophila S2 cell mitochondria 
(left) reveals cristae abnormalities in Elovl6-depleted cells. Middle, 
quantification (n = 200). Significance of the difference was calculated with a 
Mann-Whitney test (*P < 0.05). Right, average circularity of mitochondria 
was calculated with Image] software. Scale bar, 1 um. n = 200, ****P < 0.0001, 
two-tailed t-test. Error bars show s.d. 
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Extended Data Figure 6 | C18:0 removal does not lead to endoplasmic 
reticulum stress, and inhibiting UPR does not inhibit mitochondrial 
fragmentation upon C18:0 removal. a, C18:0 removal for 24h does not 
induce expression of UPR target genes, quantified by quantitative RT-PCR, 
normalized to RPL19. BiP (also known as HSPA5) is a readout for IRE1 (also 
known as ERN1) activation, CHOP (also known as DDIT3) is a readout for 
ATF6 activation, and PERK (also known as EIF2AK3) is a readout of its own 
activation due to a positive transcriptional feedback loop. Tunicamycin 
serves as a positive control. y axis is displayed in a logarithmic scale to fit all data 
points on one graph. The experiment was done in triplicates. b, p-eIF20, a UPR 
marker, does not increase upon removal of C18:0 whereas it is induced 


LETTER 


by tunicamycin, a positive control. See Supplementary Fig. 18 for image of the 
uncropped full western blot. c, Knocking down mediators of the UPR response 
does not inhibit mitochondrial fragmentation upon C18:0 removal. HeLa 
cells were transfected with either control siRNAs or siRNAs targeting UPR 
mediators as indicated. Left, the mitochondrial fragmentation index; right, 
representative images. n = 15. d, Inhibiting endoplasmic reticulum stress by 
means of a chemical chaperone, TUDCA, does not rescue mitochondrial 
fragmentation upon C18:0 removal. HeLa cells were pre-treated with 

500 pg ml! TUDCA 30 min before delipidated serum treatment. Left, 
mitochondrial fragmentation index (n = 15); right, representative images. 

a, c, d, *P < 0.05, **P < 0.01, two-tailed t-test. Error bars show s.d. 
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Extended Data Figure 7 | TFR1 is the mediator of C18:0 signalling to 
mitochondrial morphology. a, C17:0-azide is a functional analogue of 
C18:0 in that it induces mitochondrial fusion in HeLa cells, whereas other 
C18:0 derivatives are not. Cn:0-azide = HO,C(CH)2),, — ,N3; Cn:0-alkyne = 
HO,C(CH;),, — ;CCH. b, TFRI1 is the most enriched protein in a C17:0-azide 
pulldown, and it regulates mitochondrial morphology. HeLa cells were treated 
with C17:0-azide for 2h, and covalently bound proteins were precipitated by 
lysing cells under denaturing conditions (8 M urea), and linking the C17:0- 
azide to an alkyne-labelled resin via click chemistry (left). Precipitated proteins 
were identified by mass spectrometry, and peptide counts were normalized to 
peptide counts in a negative control pulldown from cells not treated with 
C17:0-azide (n = 3) (right; column 2). Indicated proteins were also tested by 
siRNA-mediated knockdown for effects on mitochondrial morphology 
(column 3). ¢, TFR1 is covalently bound to the C18:0 derivative C17:0-azide in 
HeLa cells in a ZDHHC6-dependent manner. HeLa cells were treated with 
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C17:0-azide for 2 h, and subsequently lysed in denaturing conditions (8 M 
urea). Similar to b, the C17:0-azide was ‘clicked’ onto a biotinylated alkyne, and 
the labelled proteins were pulled-down with streptavidin beads. After 
washing, immunoprecipitated proteins were eluted off beads in Laemmli buffer 
containing biotin, and analysed by immunoblotting. The palmitic acid 
analogue C15:0-azide was used as a positive control since TFR1 is known to 
also be palmitoylated. C17:0-azide pulls down more TFR1 than equal amounts 
of C15:0-azide, indicating that TFR1 palmitoylation cannot account for the 
C17:0 signal. The C17:0-azide-TFR1 interaction is completely blunted upon 
ZDHHC6 knockdown. See Supplementary Fig. 18 for image of the 
uncropped full western blot. d, TFR1 is required for C18:0 removal to induce 
mitochondrial fragmentation. HeLa cells were transfected with either control 
or TFRI targeting siRNAs before treatment with medium containing 
delipidated serum plus or minus C18:0. Representative images are shown here 
and quantification of mitochondrial fragmentation is shown in Fig. 3h. n = 15. 
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Extended Data Figure 8 | C18:0 removal does not affect iron uptake or 
delivery. a, HeLa cells cannot grow in the presence of deferoxamine (DFO), an 
iron chelator (top) whereas they grow in delipidated serum lacking C18:0 at a 
comparable rate to cells in control medium (bottom). n = 3. b, Treatment 

of HeLa cells with medium containing delipidated serum (lacking C18:0) for 
24h does not lead to transcriptional activation of iron deficiency response genes 
(bottom), which are activated by DFO-mediated iron chelation (24h) as a 
positive control (top). n = 3. c, Treatment of HeLa cells with medium 
containing delipidated serum for 24h or 4 days does not lead to a drop in 
levels of succinate dehydrogenase b (SDHB), which contains an Fe-S cluster. 
See Supplementary Fig. 19 for image of the uncropped full western blot. 

d-f, Treatment of HeLa cells with medium containing delipidated serum for 
24h or 4 days does not lead to a drop in activities of enzymes containing 
lipoylated subunits (PDH and OGDH) (d, e) or Fe-S-cluster-containing 


subunits (SDH) (f). DFO treatment to chelate iron from the medium, 

or siRNA-mediated depletion of the enzymes were used as positive controls 
(d-f, bottom). n = 4. g, Treatment of HeLa cells with medium containing 
delipidated serum (24 h) does not cause a reduction in transferrin uptake. Cells 
were treated with 25 pg ml ' Alexa-488-coupled transferrin for 30 min. 
Representative images (left) and quantification of the amount of transferrin per 
cell (right) (n = 5). h, Treatment of HeLa cells with medium containing 
delipidated serum (24h) does not reduce association of transferrin-containing 
vesicles with mitochondria. Crude mitochondria were fractionated from 

cells growing in medium containing or lacking C18:0, and the amount of 
transferrin that copurifies with mitochondria was analysed and quantified by 
immunoblotting. See Supplementary Fig. 19 for image of the uncropped full 
western blot. a, b, d-g, *P < 0.05, ***P < 0.001, ****P < 0.0001, not 
significant (NS) P = 0.05, two-tailed t-test. Error bars show s.d. 
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Extended Data Figure 9 | JNK signalling is required for mitochondrial 
fragmentation induced by C18:0 removal. a, Treatment of HeLa cells with 
1 uM gambogic acid does not induce apoptosis. 10 1M gambogic acid was used 
as a positive control for apoptosis induction, assessed by cleaved caspase-3 
levels. 1 14M gambogic acid neither induces caspase cleavage (shown here) 
nor causes cells to die (data not shown). Cells were treated with 10 uM 
gambogic acid for 1 h, or for all other concentrations for 3 h. See Supplementary 
Fig. 20 for image of the uncropped full western blot. b, Activation of TFR1 by 
treating cells with 1 1M gambogic acid leads to mitochondrial fragmentation 
that is reversed by 1 h C18:0 pre-treatment. Representative images are 

shown here and quantification of mitochondrial fragmentation is shown in 
Fig. 3i (n = 15). c, Treatment of HeLa cells with C18:0 to inhibit TFR1 


causes reduced JNK signalling activity, assayed by p-JNK levels on an 
immunoblot. See Supplementary Fig. 20 for image of the uncropped full 
western blot. d, Removal of C18:0, as well as treatment with gambogic acid, 
induces shuttling of phosphorylated JNK into the nucleus. Cells were stained 
with phospho-JNK antibody (left) and relative levels of nuclear to cytosolic 
phospho-JNK signal was quantified (right) (n = 37 cells). ***P < 0.001, 
two-tailed t-test. Error bars show s.d. e, JNK signalling is required for TFR1 
activation to induce mitochondrial fragmentation. HeLa cells were treated with 
the JNK inhibitor SP600125 30 min before gambogic acid treatment to 
activate TFR1. Representative images are shown here and quantification of 
mitochondrial fragmentation is shown in Fig. 3j (n = 15). 
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Extended Data Figure 10 | Dietary C18:0 improves Parkinson’s disease 
phenotypes of Pink and Parkin mutant flies. a, b, Dietary C18:0 
supplementation (10%) significantly increases lifespan of male Parkin’> (a) and 
Pink1” (b) mutant flies. n = 8 X 10 animals. c, Dietary C18:0 supplementation 
rescues ATP levels of 1-week-old male Pink1®’ mutant adult flies. n = 3 X 3 
animals. d, Dietary C18:0 supplementation significantly improves locomotor 
defects of 2-week-old male Pink1®’ mutant flies. Locomotion quantified as 
animals climbing up past a threshold in a given amount of time (technical 
duplicates, biological quadruplicates, ten animals per assay). e, Parkin loss of 


function in flies leads to mitochondrial fragmentation, which is rescued by 
dietary supplementation with C18:0. Guts from 14-day-old female control 

or park’’ mutant adult flies expressing mitoGFP and grown on food 
supplemented with or without C18:0 (10%) were dissected and mitochondria 
were imaged. Quantification of mitochondrial fragmentation is shown 

(3 animals per condition, 6 optical areas per animal). b-d, Control flies are the 
revertant line PinkI®’. Error bars show s.d. Not significant (NS) P = 0.05, 
*P<0.05. 
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GGGGCC repeat expansion in C9orf72 compromises 
nucleocytoplasmic transport 


Brian D. Freibaum!*, Yubing Lu’*, Rodrigo Lopez-Gonzalez’, Nam Chul Kim’, Sandra Almeida”, Kyung-Ha Lee’, Nisha Badders', 
Marc Valentine’, Bruce L. Miller*, Philip C. Wong*, Leonard Petrucelli®, Hong Joo Kim', Fen-Biao Gao* & J. Paul Taylor® 


The GGGGCC (G,C;) repeat expansion in a noncoding region of 
C9orf72 is the most common cause of sporadic and familial forms 
of amyotrophic lateral sclerosis and frontotemporal dementia’. 
The basis for pathogenesis is unknown. To elucidate the conse- 
quences of G,C, repeat expansion in a tractable genetic system, 
we generated transgenic fly lines expressing 8, 28 or 58 G4C- 
repeat-containing transcripts that do not have a translation 
start site (AUG) but contain an open-reading frame for green 
fluorescent protein to detect repeat-associated non-AUG (RAN) 
translation. We show that these transgenic animals display 
dosage-dependent, repeat-length-dependent degeneration in neur- 
onal tissues and RAN translation of dipeptide repeat (DPR) pro- 
teins, as observed in patients with C9orf72-related disease. This 
model was used in a large-scale, unbiased genetic screen, ultimately 
leading to the identification of 18 genetic modifiers that encode 
components of the nuclear pore complex (NPC), as well as the 
machinery that coordinates the export of nuclear RNA and the 
import of nuclear proteins. Consistent with these results, we found 
morphological abnormalities in the architecture of the nuclear 
envelope in cells expressing expanded G,C, repeats in vitro and 
in vivo. Moreover, we identified a substantial defect in RNA export 
resulting in retention of RNA in the nuclei of Drosophila cells 
expressing expanded G,C, repeats and also in mammalian cells, 
including aged induced pluripotent stem-cell-derived neurons 
from patients with C9orf72-related disease. These studies show 
that a primary consequence of G,C, repeat expansion is the com- 
promise of nucleocytoplasmic transport through the nuclear pore, 
revealing a novel mechanism of neurodegeneration. 

To explore pathogenic mechanisms of disease initiated by C9orf72 
repeat expansion in a genetically tractable model organism, we used 
PhiC31 integrase-mediated insertion of 8, 28 or 58 copies of G4Cz 
repeats ((G4C2)g, (G4Cz)2g and (G4Cz)sg, respectively) into specific 
genomic loci of Drosophila. These repeats are not expressed at baseline 
but are transcribed when GAL4 is introduced in trans by genetic 
cross (Fig. 1a). The green fluorescent protein (GFP) coding sequence 
without a start codon was placed in frame with a potential poly- 
(glycine-proline) (poly(GP)) dipeptide repeat (Fig. 1a), although 
RAN translation of the sense strand may also occur in two other 
reading frames producing poly(glycine-arginine) (poly(GR)) and 
poly(glycine-alanine) (poly(GA)). 

We expressed these constructs in the Drosophila eye using GMR- 
GAL4 and observed a length- and dosage-dependent rough eye pheno- 
type (Fig. 1b) similar to that reported recently**. Toxicity was also 
observed when these repeats were expressed in other tissues. 
Expression of (G4C,)sg, but not (G,C)g, in motor neurons using 
OK371-GAL4 led to small larvae with significantly impaired locomotor 
activity (Fig. 1c, d). The neuromuscular junctions (NMJs) were exam- 
ined with presynaptic and postsynaptic markers (Fig. le). Expression of 


(G4C2)sg resulted in a significant decrease in bouton number and 
total muscle area when compared with GFP or (G,4C2)s (Fig. 1f). 
Additionally, active zones within the NMJ were markedly reduced in 
larvae expressing (G4C2)s3 compared with controls (Extended Data 
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Figure 1 | G,C, repeats induces length- and dosage-dependent 
degeneration in Drosophila. a, Constructs expressing 8, 28, or 58 copies of 
G,C, repeats. b, (G4C2)5g causes a rough eye phenotype. c, Two copies of 
(G4C2)sg expressed in motor neurons result in a decrease in larval size (left) and 
locomotor activity (right). d, Distance travelled by larvae expressing repeats in 
motor neurons. Values are mean = s.e.m., nm = 5 (control, (G4C2)g) or 

6 (G4Cp)sg larvae (3 trials per larvae), **P < 0.01, by one-way analysis of 
variance (ANOVA), Tukey’s post hoc test. e, Expression of two copies of 
(G,4C)s5g in motor neurons reduces bouton number. Scale bar, 25 um. f, Bouton 
number and muscle size in larvae expressing G,C, repeats. Values are mean + 
s.e.m., n = 6 larvae (3 trials per larvae), **P < 0.01, by one-way ANOVA, 
Tukey’s post hoc test. 
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Fig. 1a, b). Expression of (G4C2)sg by the pan-neuronal driver elav- 
GAL4 also resulted in dosage-dependent locomotor defects and NMJ 
abnormalities (Extended Data Fig. 1c—f). Moreover, expression of G4C, 
repeats by the muscle-specific driver MHC-GAL4 led to age-dependent 
defects in indirect flight muscle, resulting in permanent abnormal wing 
position as compared to controls (Extended Data Fig. 1g). 

In this fly model, RAN translation occurs in a G4C,-repeat-length- 
dependent manner. Poly(GP)-GFP dipeptide repeats were detected 
in flies expressing (G4C2)sg and to a lesser extent (G4C2)g, but not in 


flies expressing equivalent levels of (G4C2)g (Extended Data Fig. 2a, 
b). Indeed, nuclear and cytoplasmic inclusions of poly(GP)-—GFP 
were present in tissues such as brain, muscle and salivary gland 
expressing (G4C )sg but not (G4C,)g (Extended Data Fig. 2c-g). To 
determine whether DPRs are sufficient to drive a degenerative pheno- 
type, we used codons alternative to G4C; to generate new transgenic 
flies directly expressing poly(GA), poly(GR), or poly(GP), with an 
AUG start codon and amino-terminal GFP (Supplementary Table 1 
and Extended Data Fig. 3a). Expression of neither GFP-(GP)47 nor 
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Figure 2 | Genetic screen identifies multiple modifiers of (G4C2)53 toxicity 
in the nucleocytoplasmic transport pathway. a, (G,C)sg expression driven 
by GMR-GAL4 causes rough eye phenotypes (top right) which are enhanced by 
either Df(2R)1725/+ or Df(2R)1735/+ (middle). b, Overlapping genomic 
region of deficiency lines Df(2R)1725 or Df(2R)1735. c, RNAi knockdown 
(Nup50°°°"? and Nup50'°"°"**) and genetic allele (Nup50““*”’) identify 
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Nup50 as an enhancer. d, The deficiency Df(3R)Antp17 (green) but not others 
(grey) suppressed the (G4C)sg phenotype. e, f, Identification of Refl as a 
suppressor. g, Table summarizing modifiers and their functions. h, Suppressors 
(green) and enhancers (red) of (G4C2)sg toxicity in the nucleocytoplasmic 
trafficking pathway. 
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GFP-(GA)>so elicited a degenerative eye phenotype (Extended Data 
Fig. 3a, b). Thus, while poly(GP)-GFP serves as a useful marker of 
cells expressing (G4C2)53 RNA and producing DPR proteins, it is 
unlikely to contribute substantially to the degenerative phenotype. 
By contrast, expression of GFP-(GR)s5o was highly toxic, resulting in 
greater than 95% lethality with a few escapers having severely degen- 
erated eyes (Extended Data Fig. 3a). These results are consistent with 
recent reports indicating that poly(GR) is toxic in cultured cells and 
Drosophila*°*. In fact, immunoblotting with antibodies against DPRs 
showed that flies expressing (G4C2)s3 RNA produce poly(GR) in 
addition to poly(GP) DPRs, although neither poly(GA) nor 
poly(PR) was detected (Extended Data Fig. 3c, d). Thus, (G4C2)s5g 
toxicity may be mediated through toxic RNA or poly(GR) produced 
in our fly model, or a combination of these mechanisms. 

To gain unbiased insight into the pathogenic mechanisms under- 
lying (G4C,)sg toxicity, we performed a large-scale genetic screen to 
identify genetic loci whose partial loss of function may strongly modify 
the (G4C,)5s-induced rough eye phenotype. To this end, flies expres- 
sing (G4C,)sg in the eye were crossed with 372 chromosomal deficiency 
lines spanning the entirety of the second and third chromosomes, 
representing ~80% of the fly genome. In total, 5.1% of the deficiencies 
were found to suppress the rough eye phenotype, whereas 8.8% of these 
enhanced the rough eye phenotype (Supplementary Table 2). 

Two deficiency lines on the second chromosome, Df(2R)1725 and 
Df(2R)1735, each in heterozygotes, had no eye phenotype by them- 
selves but led to a marked enhancement of the (G4C,)5, rough eye 
phenotype (Fig. 2a). The genomic regions covered by these deficiencies 
partially overlap (Fig. 2b), suggesting that one or more genes in this 
overlapping region are responsible for the observed enhancement. 
After systematic evaluation of this candidate interval with classical loss 
of function (LOF) alleles and RNA interference (RNAi) lines, we iden- 
tified the gene Nup50, the partial LOF of which enhanced the (G4C2)sg 
phenotype (Fig. 2c). The effect of Nup50 was specific to (G4Ca)sg, since 
there was no rough eye phenotype present when Nup50 was knocked 
down in eyes expressing GFP (Fig. 2c). Nup50 is a component of the 
nuclear pore and also has a critical role in promoting protein nuclear 
import through interaction with Importin B and Ran GTPase’. 
Consistent with the impact of Nup50 LOF on (G4C2)5s-mediated 
degeneration, a dominant-negative form of Ran (Ran'”*)$ strongly 
enhanced the (G4C2)s53 rough eye phenotype (Extended Data Fig. 4a). 
Moreover, the nuclear import factors Nup153 and Transportin, which 
work together with Nup50 and Ran’, were also identified as enhancers 
of the (G4C,)sg rough eye phenotype (Extended Data Fig. 4a). These 
genetic analyses suggest that protein import is compromised by 
(G4Cz)53 expression. 

The strongest suppressor identified in the genetic screen was Ref1, 
revealed by interrogation of the suppressor deficiency Df(3R)Antp17 
(Fig. 2d, e). This deficiency also suppressed the strong rough eye 
phenotype produced by two copies of (G4C2)s5g, as did heterozygosity 
for the null allele Refl””””’ (Fig. 2f). Refl, and its human orthologue 
ALYREF, are RNA-binding proteins that associate with the 5’ end of 
messenger RNAs to prevent degradation by the nuclear exosome and, 
together with the TREX component CHTOP, facilitate delivery of fully 
processed mRNAs to the nuclear pore receptor NXF1, which mediates 
their export through the nuclear pore’”’. Consistent with the impact 
of Refl LOF on (G4C;)sg-mediated degeneration, partial LOF in 
Drosophila orthologues of NXF1 or CHTOP enhanced the rough eye 
phenotype (Fig. 2g, h and Extended Data Fig. 5). The fact that the 
(G4C2)53 phenotype is exacerbated by LOF in NXF1 or CHTOP ortho- 
logues, but partially rescued by LOF of Ref1, perhaps highlights the 
dual role played by Refl in partitioning target RNAs between the 
nuclear pore and the nuclear exosome’. In support of this interpreta- 
tion, we identified LOF in Drosophila orthologues of human EXOSC3 
and EXOSC10, which encode components of the nuclear exosome, as 
strong enhancers of (G4C;)sg-related toxicity (Fig. 2g, h and Extended 
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Data Fig. 5). Notably, LOF mutations in EXOSC3 in humans cause a 
congenital form of motor neuron disease’’. 

The Drosophila genes encoding human orthologues of cap-binding 
proteins NCBP1, NCBP2 and ARS2, which mediate recruitment of the 
TREX complex to the 5’ end of RNA to initiate RNA export’, were 
also identified as enhancers of (G4C2)ss-mediated degeneration 
(Fig. 2g, h and Extended Data Fig. 5). Moreover, we found that LOF 
of either Nup107 or its binding partner Nup160, both of which are 
nuclear pore components responsible for exporting some RNAs'*"””, 
suppressed the (G4C2)53 rough eye phenotype (Extended Data Fig. 5). 
One particularly notable enhancer was the Drosophila orthologue of 
human GLEI, a critical mediator of RNA export at the nuclear pore’® 
(Fig. 2g, h and Extended Data Fig. 5). Interestingly, complete LOF of 
human GLE] causes congenital motor neuron disease’, whereas par- 
tial LOF of human GLE] is associated with adult-onset amyotrophic 
lateral sclerosis (ALS)”°. 
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Figure 3 | Drosophila salivary gland cells expressing (G4C2)5g exhibit 
nuclear envelope abnormalities and accumulation of nuclear RNA. a, The 
effect of (G4C2)sg expression on Nup107 localization. Scale bar, 50 um. 

b, (G,C2)sg expression causes abnormal nuclear envelope morphology. Scale 
bar, 50 um. c, Nuclear envelope phenotype in cells expressing either (G4C2)g 
(n = 251 cells) or (GyC2)5g (1 = 127 cells). Values are mean + s.e.m. from 

3 biological replicates, **P < 0.001 by Student’s t-test. d, Accumulation of total 
nuclear RNA relative to cytoplasmic RNA (red). Knockdown (KD) of Refl 
led to a partial rescue of nuclear RNA accumulation. Scale bar, 25 jum. 

e, f, Relative intensity of total nuclear RNA versus cytoplasmic RNA in cells 
from random fields of 5 larvae per genotype. Values are mean + s.e.m., 

n = 30 cells (60 cells for (GyCz)s5g/Nup107 KD and (G,4C2)sg/Refl KD), 
*P<0.01, **P < 0.001 by one-way ANOVA, Tukey’s post hoc test. 
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The identification of TREX and nuclear pore proteins as genetic 
modifiers strongly implies that not only is nuclear import compro- 
mised by (G4C2)sg expression but also that nuclear export of RNAs 
and proteins is compromised. Consistent with this latter notion we 
found that LOF of Crm1 (also called emb), a gene encoding a major 
receptor for the export of RNAs and proteins, resulted in strong 
enhancement of the (G,C,)53 phenotype (Extended Data Fig. 4b). 
Moreover, expression of (G4C2)sg in motor neurons caused exquisite 
sensitivity to leptomycin B, an inhibitor of nuclear export (Extended 
Data Fig. 4c). In total, the genetic screen and follow-up analyses 
identified 18 modifier genes within the pathway of nucleocytoplas- 
mic transport, and RNA export in particular, highlighting this sys- 
tem as an important target of (G4C>)sg-related toxicity (Fig. 2g, h 
and Supplementary Table 3). In particular, the identification of four 
strong suppressors of (G4C2)sg indicates that compromised nucleo- 
cytoplasmic transport is an important causal pathway responsible 
for degeneration. 

In follow up to our genetic studies, RFP-Nup107 was expressed in 
salivary glands using Fkh-GAL4. In cells expressing (G,C2)g, Nup107 
labels a distinct nuclear boundary that is morphologically indistin- 
guishable from cells expressing GFP (Fig. 3a). In contrast, in cells 
expressing (G4C2)s5g, the nuclear envelope exhibited a wrinkled 
appearance and in many nuclei Nup107 was found to form inclusions 
near the nuclear envelope (Fig. 3a). To explore nuclear architecture 
further, the nuclear envelope was visualized by immunostaining of 
endogenous Lamin C. A total of 43.7% of cells expressing (G4C2)sg 
showed an abnormal, ‘frayed’ nuclear envelope phenotype, whereas 
this phenotype was observed in only 7.1% of cells expressing (G4C2)g 
(Fig. 3b, c and Extended Data Fig. 6a). Thus, (GyC,)s3 expression 
causes defects in the architecture of the nuclear envelope and mislo- 
calization of nucleoporins. Moreover, cells expressing one copy of 
(G4C2)sg showed a significant increase in the ratio of nuclear to cyto- 
plasmic RNA in comparison to cells expressing GFP or (G4C2)g 
(Fig. 3d, e). Expression of two copies of (G4C )sg resulted in a more 
pronounced retention of nuclear RNA (Fig. 3d, e). Importantly, deple- 


tion of Refl by RNAi partially suppressed the nuclear RNA retention 
phenotype in cells expressing (G4C )5g, reducing nuclear RNA density 
to control levels (Fig. 3d—f). Thus, GC, length-dependent degenera- 
tion in Drosophila is accompanied by nuclear retention of RNA, con- 
sistent with the results of the unbiased genetic screen. 

Similar retention of nuclear RNA was observed in HeLa and 
HEK293T cells upon transient expression of (G4C2)sg but not 
(G4C2)g (Fig. 4a-e and Extended Data Fig. 6b). Specifically, this export 
defect was illustrated using an alternative approach to monitor the fate 
of newly synthesized RNA by metabolic labelling. In HeLa cells expres- 
sing (G,C)sg we observed a significant decrease in the export of nas- 
cent RNA in comparison to control cells expressing GFP or (G4Ca), 
although the total levels of metabolically labelled transcripts were 
comparable (Fig. 4a-c and Extended Data Fig. 6c). Moreover, 
poly(A)* mRNA as detected by an oligo-dT probe accounts for at 
least some portion of the retained RNA in mammalian cells expressing 
(G4C2)58 (Fig. 4d, e). 

Advances in induced pluripotent stem cell (iPSC) technology per- 
mit the generation of neurons from patients including those with 
frontotemporal dementia (FTD) and ALS*’. To examine further the 
impact of expanded G,C, toxicity on nucleocytoplasmic transport in a 
more relevant human cell type from C9orf72 patients, we examined the 
subcellular distribution of total RNA in 2-month-old cortical neurons, 
which are affected in FTD, a frequent clinical feature of C9orf72- 
related disease”. These neurons were differentiated from previously 
characterized”** as well as newly generated and characterized iPSC 
lines derived from 5 C9orf72 patients and 3 controls (Extended Data 
Figs 7 and 8 and Supplementary Tables 4 and 5). Consistent with our 
observations in fly, HEK293T and HeLa cells, this analysis confirmed a 
significant increase in the nuclear to cytoplasmic ratio of RNA in 
C9orf72 patient cells in comparison to controls (Fig. 4f, g). On average, 
C9orf72 neurons showed a 35% increase in the nuclear to cytoplasmic 
ratio of RNA density (Fig. 4g). Notably, this difference in the ratio of 
nuclear to cytoplasmic RNA was not observed in fibroblasts obtained 
from C9orf72 patients when compared to controls, consistent with low 
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levels of C9orf72 expression in fibroblasts”° (Extended Data Fig. 9a-c). 
Thus, neurons derived from patients with C9orf72-related disease 
mirror the nucleocytoplasmic transport defects identified by unbiased 
approaches in a genetic model of disease. 

The NPC is the largest macromolecular complex in eukaryotic cells, 
consisting of multiple copies of more than 30 different proteins”. 
Some structural components of the NPC have an exceedingly long 
half-life measured in years”’. Indeed, nuclear pores are a particularly 
intriguing target for age-related diseases affecting post-mitotic cells 
such as neurons, putting them at risk of accumulating damage over 
extended periods of time’’. Nucleocytoplasmic transport through the 
nuclear pore is vital to cell viability, and primary defects in this func- 
tion cause a host of human diseases of varying phenotypes ranging 
from cancer to neurological disease**”*””. 

Important questions regarding the pathogenesis of C9orf72-related 
disease remain; in particular the relative contribution of expanded 
G,C, RNA versus DPRs or other mechanisms, which could cause 
the nuclear pore defect. Also to be determined is whether defects in 
nucleocytoplasmic transport and other reported defects such as 
sequestration of specific RNA binding proteins, nucleolar stress and 
ER stress are parallel processes or directly connected with each other. 
The rapid pace of discovery in this area of disease biology suggests that 
answers to these questions will come soon. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


The experiments were not randomized, and the investigators were not blinded to 
allocation during experiments and outcome assessment. No statistical methods 
were used to predetermine sample size. 

Generation of Drosophila lines. To generate transgenic Drosophila expressing 
(G4C2)g, (GaCz)2g and (G4C2)5g, G4C2 repeats of the respective length were cloned 
downstream of the UAS promoter and upstream of the EGFP sequence with the 
start codon removed in the plasmid pUAST-ATTB. Transgenic Drosophila lines 
were generated by BestGene Inc. such that the transgene was inserted using the 
PhiC31 integrase into either the attP2 site on chromosome 3 (loci 68A4) or the 
attP40 site on chromosome 2 (loci 25C6). GFP-(GP)47, GFP-(GA)s9 and GFP- 
(GR)so plasmids*® were subcloned into the pUAST-ATTB plasmid and inserted 
into the attP2 site by BestGene Inc. Flies were raised at 25 °C ona standard diet. A 
complete listing of Drosophila stocks that modify the (G4C2)s3 phenotype is given 
in Supplementary Table 3. For genetic interaction studies, the combined stock 
(GMR-GAL4/Cyo; UAS-(G4C2)s3/Tm6,Tb) was crossed with deficiency stocks or 
individual mutants, and phenotypic analysis was performed on flies aged 24-48 h 
in both males and females. Knock down efficiency of RNAi lines was measured by 
quantitative RT-PCR (Extended Data Fig. 10 and Supplementary Table 6). For 
nuclear envelope morphology studies, the recombined stocks (Fkh-GAL4, UAS- 
GFP/TM6,Tb; Fkh-GAL4, UAS-(G4C2)3/TM6,Tb; and Fkh-GAL4,UAS-(G4C>)59/ 
TM6,Tb) were crossed with UAS-mRFP-Nup107 transgenic flies or immunos- 
tained for endogenous Lamin C. 

Drosophila stocks. Deficiency stocks were obtained from the Bloomington 
Drosophila Stock Center deficiency kit. RNAi lines were obtained from either 
the Bloomington Drosophila Stock Center or the Vienna Drosophila RNAi 
Center. UAS-RAN™N and Fkh-Gal4 flies were provided by K. S. McKim and 
E. Baehrecke, respectively. 

Drosophila eye, muscle and neuronal phenotype analysis. Phenotypic analysis 
of G4C2 repeat expression in the Drosophila eye, muscle and salivary gland was 
assessed by crossing (G4C2)s, (G4C2)2g and (G4C2)sg lines to GMR-GAL4, MHC- 
GAL4 and Fkh-GAL4, respectively. Neuronal expression was achieved by crossing 
(G4Cz)g, (G4Cz)2g and (GyC2)sg lines to elav-GAL4 (pan neuronal driver) or 
OK371-GAL4 (motor neuron driver). Eye phenotypes were imaged by light micro- 
scopy and muscle phenotypes were visually assessed by wing posture (n = 30 
controls, 21 (G4C2)s, 50 (G4C2)2g and 24 (G4C2)sg). Eye phenotypes are repres- 
entative images resulting from Drosophila crosses. Crosses were performed twice 
to validate the specific phenotype. Neuromuscular synaptic bouton number, active 
zone density and crawling ability were measured as previously described*!*” 
with slight modification. To count type 1b synaptic bouton number, each geno- 
type was double stained presynaptically with anti-HRP-Cy3 (1:200, Jackson 
Immunoresearch) and postsynaptically with anti-Disc large 1 (1:50, DSHB). 
Synaptic boutons of muscle 4 in abdominal segments 2, 3 and 4 (A2-A4) 
were imaged with a Marianas spinning disc microscope and maximum projec- 
tion images were used to count synaptic boutons. Active zone area and 
presynaptic area stained by anti-Bruchpilot (NC82, 1:100, DSHB) and anti- 
HRP-Cy3 were measured with Image] and the ratio of active zone area/presynap- 
tic area was calculated. Muscle 4 in abdominal segments 2, 3 and 4 (A2-A4) was 
analysed. To examine crawling ability, 4-7 wandering third instar larvae for each 
group were collected, washed and placed onto a 3% agarose gel in a 10cm 
dish. After 5 min acclimation, larval crawling behaviour was recorded by a digital 
camera for 30s (15 frame per second (fps)). Each larva was tested three times 
(3 technical replicates). Moving distances of each larva were manually measured 
with Image]. 

Immunofluorescence analysis of salivary gland cells. Dissected salivary glands 
from wandering third instar larvae were fixed with 4% paraformaldehyde (PFA) 
at room temperature for 15 min, washed three times with phosphate-buffered 
saline (PBT, 0.1% Triton X-100 in PBS), and blocked for 1h at room temperature 
with 10% normal goat serum (Sigma) diluted in PBT. Tissues were then incu- 
bated overnight at 4 °C with mouse anti-Lamin C (1:30, LC28.26; Developmental 
Studies Hybridoma Bank (DSHB)). After three washes in PBT (10 min each), 
tissues were incubated for 1h at room temperature with secondary antibodies 
(goat anti-mouse Alexa Fluor 568, 1:200; Invitrogen) diluted in the blocking 
solution. Tissues were washed three times in PBS and mounted with 
VECTASHIELD (Vector Laboratories). Fluorescence signals were examined with 
an Olympus IX70 Microscope or Leica TCS SP5 II laser scanning confocal 
microscope. 

Visualization of RAN products in muscle cells. To visualize poly(GP)—GFP in 
the thorax expressing (G4C2)sg, flies were cleared overnight using a modified 
ScaleA2 (2M urea, 10% glycerol, 0.1% Triton-X). RAN peptides were visualized 
using light sheet microscopy. To visualize RAN products at high magnification, 
thoraxes were dissected and stained with the same immunofluorescence protocol 
described above for salivary glands but using anti-Lamin (1:100, ADL 67.10; 


Developmental Studies Hybridoma Bank (DSHB)) to visualize the nucleus and 
phalloidin (1:40, Life Technologies) to visualize the muscle structure. 
Immunoblots. Adult flies were frozen with dry ice and vortexed to remove the head 
or thorax. Samples from each genotype were homogenized in RIPA buffer with 
proteinase inhibitor cocktail added. Sample was mixed with 4X sample buffer (1M 
Tris-HCl (pH 6.8), 8% SDS, 40% glycerol, 0.1% bromophenol blue) and boiled for 
5 min, separated on an SDS gel and transferred to a membrane. The membrane was 
blocked, probed with primary antibody, and incubated with secondary antibody. 
The signal was visualized with either chemiluminescent substrate (SuperSignal West 
Pico; Pierce) or by using an Odyssey Fc (Li-Cor). RAN products were visualized by 
dot blot using the previously described protocol’. Primary western blot antibodies 
were anti-GFP (AB3080, Millipore or SC-9996 Santa Cruz Biotechnology), anti- 
poly(GP), poly(GA) or poly(GR) antibodies**’, anti-B-actin antibody (4967; Cell 
Signaling Technology) or anti-actin antibody (sc-1616, Santa Cruz Biotechnology). 
RNA in situ hybridization. Fixation of dissected salivary glands, cortical neurons, 
HeLa, or 293T cells (obtained from ATCC) was performed using 1% para- 
formaldehyde on ice for 5 min, followed by a second fixation in 1% paraformalde- 
hyde plus 0.05% NP40 for an additional 5 min. Samples were then transferred to 
70% ethanol and stored at —20°C until needed. The probes for global RNA in 
either Drosophila or human cells were prepared from genomic DNA derived from 
w''8 flies or 293T cells. Both probes were labelled by nick translation using either 
alexafluor 594 dUTP. To hybridize the probes, 70 ng of the labelled probes (the 
human probe was combined with 2 1g of human cot] DNA) were suspended in 
10 ul of hybridization buffer consisting of 50% formamide, 2X SSC, and 10% 
dextran sulfate. Fixed cells were dehydrated in 70%, 80% and 100% ethanol for 
2 min each before hybridization. Probes suspended in hybridization buffer were 
denatured at 70 °C for 5 min and then applied to the dehydrated cells and hybri- 
dized at 37°C overnight. Samples were washed after hybridization in 50% for- 
mamide, 2 SSC at 37 °C for 5 min, then briefly rinsed in room temperature PBS 
and mounted on slides with DAPI. To detect poly(A) mRNA, HeLa cells were 
fixed with 4% paraformaldehyde followed by ice cold methanol and 70% ethanol 
for 10 min each. The cells were treated with 1 ng per 1 1] of 5’-labelled Cy3-oligo- 
dT(20) at 37°C for 1h. Samples were washed in room temperature PBS and 
mounted on slides with DAPI. Images were captured using either wide-field 
fluorescence microscopy or a Marianas spinning disc microscope. An extended 
depth of focus function was used to combine all components of individual micro- 
scope fields. 

Pulse-chase of newly synthesized RNA. HeLa cells were transfected with plas- 
mids expressing GFP, (G4C2)g or (GyC2)sg that had been subcloned into 
pcDNA 3.1 using Fugene HD (Promega). After 48h incubation, cells were 
treated with 1 mM 5-ethynyl uridine (EU) for 1h. RNA was then visualized at 
the indicated time points using the Click-iT detection kit (Invitrogen) as 
recommended by the manufacturer. To visualize GFP signal, cells were co- 
stained using monoclonal anti-GFP (SC-9996, Santa Cruz Biotechnology) as a 
primary antibody. 

iPSC lines. iPSC lines from 2 control subjects and 4 G4C> repeat expansion car- 
riers have been previously characterized***** (see Supplementary Table 4). 
Generation of integration-free induced pluripotent stem cells (iPSCs) from fibro- 
blasts of one G4C2 expansion carrier and one control subject, using episomal 
plasmids with the reprogramming factors Oct4, Sox2, Klf4, Lin28 and L-Myc, 
was performed as described previously’, which was approved by the University 
of Massachusetts Medical School Institutional Biosafety Committee. The use of 
human fibroblasts was approved by the UCSF Institutional Review Board and 
informed consent was obtained from all subjects. After reprogramming, character- 
ization of iPSCs shows that cells have normal karyotype, express pluripotent 
markers and have the capacity to differentiate into cells of the three germ layers 
as described by Almeida et al.”**. All the iPSC lines used here were tested regularly 
with no mycoplasma contamination. 

Cortical neuronal cultures. Four iPSC lines from three control subjects and six 
iPSC lines from five C9orf72 carriers were differentiated to cortical neurons as 
described earlier***° (see also Supplementary Table 5). Neuronal cultures were 
aged for 8 weeks before fixation for RNA-FISH. 

Immunostaining for iPSCs and iPSC-derived neurons. For immunostaining, 
cells were fixed in 4% paraformaldehyde for 15 min and then permeabilized with 
0.3% Triton X-100 for 5 min. Cells were blocked with 5% bovine serum albumin 
for 30 min; cells were incubated with primary antibodies overnight at 4°C. The 
following primary antibodies were used: mouse anti-OCT4 1:100 (Santa Cruz 
Biotechnology), goat anti-NANOG 1:100 (R&D Systems), mouse anti-SSEA4 
1:100 (Abcam), rabbit anti-desmin 1:100 (Thermo Scientific), mouse anti-BIII- 
tubulin 1:500 (Promega), mouse anti-o.-fetoprotein 1:200 (R&D Systems), mouse 
anti-MAP2 1:500 (Sigma), goat anti-ChAT 1:200 (EMD Millipore), rabbit anti- 
VGLUT1 1:500 (Synaptic Systems). After incubation with primary antibodies, 
cells were washed with PBS three times and incubated with Alexa Fluor secondary 


©2015 Macmillan Publishers Limited. All rights reserved 


antibodies 1:500 (Invitrogen) for 1h at room temperature followed by counter 
staining with DAPI. 

Fluorescence in situ hybridization. FISH in iPSCs derived from control and 
C9orf72 expansion carrier using a Cy3-conjugated (GGCCCC), probe was per- 
formed as described previously’. 

Quantification. For salivary glands, 30 cells from at least 5 individual salivary 
glands for each genotype were used for quantitative analysis. ImageJ software 
was used to measure the ratio of nuclear to cytoplasmic RNA by comparing the 
total RNA measured in the nucleus to that of the cytoplasm. DAPI was used to 
mark the nucleus. For cortical neurons, the ratio of nuclear to cytoplasmic RNA 
density was calculated by measuring the density of the RNA signal in both the 
nucleus and cell body using ImageJ software. DAPI was used to mark the 
nuclear boundary of neurons. For cortical neurons, at least 15 neurons were 
measured from each individual patient. Error bars for all quantification are 
standard error. 
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Extended Data Figure 1 | Expression of G4Cz repeats induces length- 
dependent phenotypes in Drosophila. a, Expression of (G4C2)sg in 
Drosophila motor neurons using the OK371-GAL4 driver leads to a significant 
reduction in active zones as immunostained by the anti-Bruchpilot antibody 
NC82 and anti-HRP. Scale bar, 50 um. b, Quantification of active zones 

(n = 6 individual larvae for control, 4 for 2X (G4C2)g, and 6 for 2X (G4C2)s9). 
Values are mean + s.e.m. ** P < 0.01, one-way ANOVA, Tukey’s post hoc test. 
c, Pan neuronal expression of (G4C2)sg repeats induces dosage-dependent 
decrease in larval size (left) and locomotor activity measured in 30 s (right) 
when (G,C2)sg is expressed in all neurons using the elav-GAL4 driver. 

d, Quantification of the distance travelled by third instar larvae reveals 
expressing two copies of (G4C2)sg results in a significant deficit in locomotor 
activity. Values are mean + s.e.m. (n = 7 individual larvae for control, 4 for 1x 
(G4C2)s5g, and 5 for 2’ (GyC2)sg). **P < 0.01, one-way ANOVA, Tukey’s post 
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hoc test. e, Pan neuronal expression of (G4C2)sg repeats in Drosophila 
neurons using the elav-GAL4 driver leads to a significant reduction in the 
bouton number. Bouton number was quantified by examining the presynaptic 
(anti-HRP) and postsynaptic (anti-DLG1) markers (left). Scale bar, 50 um. 

f, Quantification of bouton number (left) and muscle size (right) reveal that 
both are significantly reduced in Drosophila larvae expressing (G1C2)sg 
repeats. Values are mean = s.e.m. (” = 6 individual larvae for control, 

5 for 2X (G4C2)g, and 6 for 2X (G4C2)58), ** P< 0.01, one-way ANOVA, 
Tukey’s post hoc test. g, Expression of (G4C2)2g and (G4C)sg but not (GyC2)g 
in the muscle using the MHC-GAL4 driver leads to loss of wing control in 
adult flies (n = 30 individual Drosophila for control, 21 for (G4Cz)g, 50 for 
(G4C2)2g, and 24 for (G4yC2)sg). This phenotype was assessed by examining the 
permanent wing posture of live adult flies. 
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Extended Data Figure 2 | RAN translation is observed in Drosophila 
expressing G4C, repeats. a, Western blot revealing translation of RAN poly- 
dipeptides in flies expressing (G4C2)sg in the eye. RAN poly-dipeptides were 
not found in flies expressing (G,C2)g or control flies. There was minimal 
expression of GFP-positive product observed in flies expressing (G4C2)28. 
GFP-expressing flies (lane 1) were used as a positive control for the anti-GFP 
antibody. b, Western blot showing production of RAN product when (G4C;)28 
and (G4C)s3 but not (G4C2)g repeats are expressed in the muscle. RAN 
products were visualized with anti-GFP antibody (left) and anti-poly(GP) 
antibody (right). c, The RAN product poly-GP-GFP from flies expressing 


GFP 


(G4C2)sg in the muscle form large visible inclusions as visualized under light 
sheet fluorescent microscopy (left) and by confocal microscopy (right). 

Scale bar, 50 pm. d, Expression of (G4C2)sg in the salivary gland cells results in 
the formation of large nuclear inclusions and smaller cytoplasmic inclusions. 
Scale bar, 50 tm. e, f, Expression of (G4C2)sg in the ventral ganglion by OK371 
driver results in the formation of nuclear and cytoplasmic inclusions, whereas 
GFP shows diffused nuclear and cytoplasmic localization. Lamin staining 
shows nuclear membrane, and CD8-RFP shows plasma membrane. Scale bar, 
10 pm. g, Expression of (G4C2)s5g in pan neuronal cells by elav driver results in 
the nuclear and cytoplasmic inclusions. Scale bar, 10 pm. 
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Extended Data Figure 3 | Ectopic expression of poly(GR) but not poly(GA) 
or poly(GP) peptides are toxic in Drosophila. a, Transgenic Drosophila 
were generated that express ATG-driven poly(GA), poly(GR) and poly(GP) 
peptides with an N-terminal GFP tag (top). Expression of GFP-(GA)s9 and 
GFP-(GP)47 were non-toxic when expressed in the eye with GMR-GAL4 
whereas GFP-(GR)s9 expression resulted in >95% lethality with surviving 
adults having severely degenerated eyes (bottom). b, Western blot showing the 
expression of (G4C,)5, GFP-(GA)s59, GFP—(GR)s9 and GFP-(GP)47 as 
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visualized in muscle by anti-GFP antibody. c, Western blot showing the 
expression of poly(GP) in muscle of flies expressing (G4C2)53 and GFP-(GP)47 
but not GFP-(GA)s9, GFP-(GR)s9 and control flies as visualized by anti-GP 
antibody. d, Dot blot analysis of RAN peptides in muscle revealing 
expression of poly(GA) only in GFP-(GA)so flies, expression of poly(GR) in 
(G4C2)s3 and GFP-(GR)so flies. As expected, anti-sense DPR poly(PR) was 
not found in any of the lysates. The background protein signal was used as a 
loading control. 
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Extended Data Figure 4 | Nuclear import and export is altered by (GaC2)s8 
expression. a, A threonine to asparagine substitution at residue 24 in the 
Ran protein abolishes the affinity for GIP and reduces its affinity for GDP. 
Hence, the Ran'™*N is always in either a nucleotide-free state or in its inactive, 
GDP-bound state, and acts as dominant negative. RAN’~“N expression 
driven by GMR-GAL4 causes a mild eye phenotype when expressed in the 
absence of (G4C2)sg (upper row, right panel). The (G,C2)s3 rough eye 
phenotype is strongly enhanced by dominant-negative Ran expression 
(middle row, left panel). The (G4C2)sg eye phenotype is strongly enhanced by 
knockdown of Nup153 by two independent RNAi lines (middle row, two right 
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panels). The (G1C2)5g eye phenotype is also mildly enhanced by knockdown of 
transportin (Trn) (bottom row). b, Knockdown of Crm1 in flies expressing 
(G4C2)s5g induces a mild enhancement of the (G4C2)sg eye phenotype (left 
versus middle). Crm1 knockdown in the absence of (GyC2)sg repeats does not 
produce a rough eye phenotype (right). c, Expression of two copies of (G4C2)sg 
in the Drosophila motor neurons leads to reduced viability (50%). Chemical 
inhibition of Crm1 with Leptomycin B (500 nM) enhances (G4C;)sg toxicity 
resulting in reduced viability (23%). Leptomycin B does not impede viability 
(100%) in Drosophila expressing GFP. n is displayed on the graph and 
represents the individual pupae from two separate experiments. 
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Extended Data Figure 5 | Phenotypes of additional suppressors and 
enhancers of (G4C2)sg. a, Phenotypes demonstrating suppression of the 
(G4C2)5g rough eye phenotype by RNAi knockdown of identified genes. 
b, Phenotypes demonstrating enhancement of the (G,C2)s3 rough eye 


phenotype by RNAi knockdown of identified genes. c, Knockdown of identified 


modifier genes shows little or no phenotype in the absence of G4C2 repeat 
expression. 
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Extended Data Figure 6 | Impairment of nucleocytoplasmic shuttling in 

Drosophila and cultured human cell lines. a, (G4C2)sg expression driven by 
Fkh-GAL4 causes an abnormal nuclear envelope as shown by Lamin C staining 
(bottom) in comparison to (G4C2)g (top). Scale bar, 10 jum. b, Transfection of 
293T cells with (G4C2)sg (bottom) but not (G4C2) (top) leads to an increase in 
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nuclear RNA puncta as visualized with a total RNA-FISH probe. Non- 
transfected cells (absence of GFP signal) do not show an increase in nuclear 
RNA in either (GyC,)g or (G4C2)s5g transfected cells. Scale bar, 25 um. 

c, Enlarged images showing slowed accumulation of newly synthesized RNA in 
the cytoplasm of HeLa cells expressing (G4C2)sg. Scale bar, 25 um. 
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Extended Data Figure 7 | Characterization of newly generated integration- _ experiments. c, After differentiation into cortical neurons about 90% of cells in 
free iPSC lines. a, iPSC lines from a control subject (line 11) anda G,C, repeat _ these cultures are MAP2-positive neurons. Scale bar, 50 um. d, Quantification 


expansion carrier (line 3 and line 8) express pluripotent markers SSEA-4, of average percentage of MAP2-positive neurons and there is no difference 
Nanog and Oct-4. Scale bar, 50 jum. b, qRT-PCR analysis of expression between control and C9orf72 cultures. Average percentages were quantified 
levels of pluripotent stem-cell markers SOX2 and Nanog in these iPSC lines from 3 independent experiments. e, Quantification of average percentage of 
showing no statistical differences between these lines and human embryonic |= VGLUT-positive excitatory neurons among all neurons; there is no difference 
stem-cell line H9. Relative mRNA levels are quantified from 3 independent between control and C9orf72 cultures. n = 3 independent experiments. 
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Extended Data Figure 8 | Karyotyping analysis and pluripotency of newly — desmin (mesoderm), PIII-tubulin (ectoderm) and Hoechst (nuclei). All lines 
generated iPSC lines. a, G-band staining showing a normal karyotype showed differentiation towards derivates of three germ layers. Scale bars, 


for all the lines analysed. b, After in vitro spontaneous differentiation of control 20 um. 
and C9 carrier iPSC lines, cells were stained for o-fetoprotein (endoderm), 
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fibroblasts derived from patients with G4C, repeat expansion. a, b, Total cytoplasmic RNA ratio in patient versus control fibroblasts. n = 16 individual 
cellular RNA was measured by FISH in fibroblasts derived from 4 control cells analysed for each line. 
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Extended Data Figure 10 | qRT-PCR analysis. a-f, (RT-PCR analysis 
demonstrating knockdown of selected modifiers in Drosophila eyes. mRNA 
levels of selected modifier (a, d), GAL4 (b, e) and (G4C2)sg (¢, f) assayed by 
qRT-PCR in progeny resulting from wild type (w1118), classical mutant allele 
or UAS RNAi lines of selected modifiers mated with either GMR-GAL4 or 
GMR-GAL4/Cyo; UAS-G4C,-58-GFP/TM6 to induce knockdown of the 
selected gene. RNA was obtained from whole Drosophila head lysates. Gene 
expression levels are mean + s.d. from n = 3 independent experiments, 
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*P< 0.05, **P < 0.01, ***P < 0.001 by one-way ANOVA, Tukey’s post hoc 
test. g, h, (RT-PCR analysis demonstrating knockdown of Refl and Nup50 in 
salivary gland. mRNA levels of selected modifier (left), GAL4 (middle) and 
(G4C2)sg (right) assayed by qRT-PCR in progeny resulting from either 
P(PZ)Ref1°”**” (g) or Nup50°°*?4/SP (h) mated with Fkh-GAL4, UAS-G,C)- 
58-GFP/TM6. RNA was obtained from salivary gland lysates. Gene 
expression levels are mean = s.d., n = 3 independent experiments. *P < 0.05 by 
Student’s t-test. 
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Orientation-specific joining of AID-initiated DNA 
breaks promotes antibody class switching 


Junchao Dong"*, Rohit A. Panchakshari'*, Tingting Zhang'*+, Yu Zhang't, Jiazhi Hu', Sabrina A. Volpi’, Robin M. Meyers’, 
Yu-Jui Holt, Zhou Du!, Davide F. Robbiani’, Feilong Meng’, Monica Gostissa't, Michel C. Nussenzweig”, 


John P. Manis* & Frederick W. Alt! 


During B-cell development, RAG endonuclease cleaves immuno- 
globulin heavy chain (IgH) V, D, and J gene segments and orches- 
trates their fusion as deletional events that assemble a V(D)J exon 
in the same transcriptional orientation as adjacent Cp constant 
region exons’. In mice, six additional sets of constant region 
exons (Cys) lie 100-200 kilobases downstream in the same tran- 
scriptional orientation as V(D)J and Cp exons’. Long repetitive 
switch (S) regions precede Cu and downstream Cys. In mature 
B cells, class switch recombination (CSR) generates different 
antibody classes by replacing Cy with a downstream Cy, (ref. 2). 
Activation-induced cytidine deaminase (AID) initiates CSR by 
promoting deamination lesions within Su and a downstream 
acceptor S region”’; these lesions are converted into DNA dou- 
ble-strand breaks (DSBs) by general DNA repair factors’. 
Productive CSR must occur in a deletional orientation by joining 
the upstream end of an Sp DSB to the downstream end of an 
acceptor S-region DSB. However, the relative frequency of dele- 
tional to inversional CSR junctions has not been measured. Thus, 
whether orientation-specific joining is a programmed mechanistic 
feature of CSR as it is for V(D)J recombination and, if so, how this 
is achieved is unknown. To address this question, we adapt high- 
throughput genome-wide translocation sequencing* into a highly 
sensitive DSB end-joining assay and apply it to endogenous AID- 
initiated S-region DSBs in mouse B cells. We show that CSR is 
programmed to occur in a productive deletional orientation and 
does so via an unprecedented mechanism that involves in cis Igh 
organizational features in combination with frequent S-region 
DSBs initiated by AID. We further implicate ATM-dependent 
DSB-response factors in enforcing this mechanism and provide 
an explanation of why CSR is so reliant on the 53BP1 DSB-res- 
ponse factor. 

Most chromosomal DSB ends join to ends of separate DSBs genome- 
wide without orientation (end) specificity*’. Similarly, non-productive 
‘inversional’ CSR joins have been found in transformed B cells®°, sug- 
gesting CSR also may not be orientation-specific’® (Fig. 1a). To address 
this possibility, we employed digestion—-circularization PCR (DC-PCR, 
Extended Data Fig. 1a) to identify the orientation of CSR joins between 
Su and Sy1 in purified mouse B cells stimulated with anti-CD40 plus 
IL4 to activate AID-targeting to Sy1 and Se, and class-switching to IgG1 
(and IgE). Most Spt to Sy1 junctions identified by this semi-quantitative 
approach were deletional (Extended Data Fig. 1b). 

To confirm DC-PCR findings and analyse potential mechanisms, 
we used high-throughput genome-wide translocation sequencing 
(HTGTS), an unbiased genome-wide approach that identifies ‘prey’ 
DSB junctions to a fixed ‘bait’ DSB with nucleotide resolution** 
(Extended Data Fig. 1c). We refer to broken ends of bait Igh DSBs 


as 5'- and 3’-broken ends; specific primers allow use of each as bait* 
(Fig. 1b, c). Prey junctions are denoted + if prey is read from 
the junction in a centromere-to-telomere direction and — if in the 
opposite direction* (Fig. 1b, c). The + and — outcomes for intra- 
chromosomal joining of broken ends of different DSBs on the same 
chromosome include rejoining of a DSB subsequent to resection, 
or joining the broken ends of two separate DSBs to form intra- 
chromosomal inversions, deletions, or excision circles** (Fig. 1b, c). 
To assess the relative frequency at which non-AID-initiated Igh DSBs 
join in deletional versus inversional orientation, we expressed I-Scel 
endonuclease in anti-CD40/IL4-activated AID-deficient B cells in 
which I-Scel targets were inserted upstream of Su and downstream 
of Sy1 (I: oe allele’; Extended Data Fig. 1d, e), or in AID-sufficient B 
cells in which Sy1 and Sy were replaced with I-Scel targets (A8p"/ 
ASy1°*! allele’; Fig. 1d and Extended Data Fig. 1f). HTGTS with 
primers that captured junctions involving 3’- or 5'-broken ends of 
I-SceI bait DSBs in the Sy1 locale revealed that a major class of recov- 
ered junctions were re-joins of bait DSBs following resection (Fig. 1d 
and Extended Data Fig. 1d-f). A second major class of bait junctions in 
the Sy1 locale involved intact or resected 3’- or 5’-broken ends of 
I-SceI-generated DSB in the Sy locale, which comprised relatively 
similar numbers of deletional (+) and inversional (—) junctions for 
bait 3’-broken ends (Fig. 1d and Extended Data Fig. 1d) and similar 
numbers of excision circle (—) versus inversional (+) junctions 
for bait 5'-broken ends (Extended Data Fig. le, f). As expected’, bait 
3’- and 5’-broken ends from the Sy1 locale recovered similar levels 
of + and — junctions genome-wide (Extended Data Fig. 2a-d). We 
conclude that joining between two I-Scel DSBs in different Igh 
S-region locations in CSR-activated B cells lacks any notable pref- 
erence for or against inversional versus deletional joins. 

In AID-deficient Igh" ek B cells, I-SceIl 5’- and 3’-broken end baits 
downstream of Sy1 did not capture Igh DSB hotspots beyond I-Scel- 
generated broken ends upstream of Sj (Extended Data Fig. 1d, e). In 
contrast, I-Scel 5'- and 3’-broken ends from the 4Sp°*"/ASy 1°" allele 
in AID-sufficient B cells joined frequently to AID-initiated Se DSBs 
60 kilobases (kb) downstream (Fig. 1d and Extended Data Fig. 1f), with 
the majority (~80%) of 3’ and 5’ ASy*"/ASy1°*" broken end joins 
distributed across the 4-kb Se in orientations that generate, respect- 
ively, excision circles (Fig. 1d) or deletions (Extended Data Fig. 1f). We 
also performed HTGTS on activated, I-Scel-expressing B cells in which 
only Sy1 was replaced by an I-Scel cassette (4Sy1°*" allele’; Fig. le). 
Beyond break-site junctions, major [gh hotspot regions of 3’ ASy1?*! 
broken ends were Sp and Se (Fig. le and Extended Data Fig. 2)). 
Junctions occurred broadly across Su, with 80% in a deletional ori- 
entation; while 90% of Se junctions were in the reciprocal excision 
circle orientation (Fig. le; Extended Data Fig. 2j). CH12F3 B 
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lymphoma cells in which Sx was replaced with an I-Scel site had a 
similar orientation bias of Sa I-Scel 3’-broken end joining to Su DSBs 
(Extended Data Fig. 2n-q). Joining of the 5’-broken ends of AS! (on 
the ASu?"/ASy 1? allele) to AID-initiated DSBs in Sy3, Sy2b and Sy2a 
in lipopolysaccharide plus anti-IgD-dextran-activated B cells were simi- 
larly orientation-biased (Extended Data Fig. 3a—c). However, joining of 
the 5’-broken ends of 4Sy°“" across an array of 28 X I-Scel sites repla- 
cing Syl’ was not orientation-biased (Extended Data Fig. 3d, e). 
Together, these findings suggest that orientation-specific CSR joining 
requires an S-region sequence and/or unique aspects of S-region DSBs. 

Mammalian S regions are G-rich on the non-template strand, giving 
AID-initiated 5’ and 3’ S-region broken ends a potential end-sequence 
bias. Also, when transcribed in the sense direction, S regions generate 
stable R-loops'*'*, which could differentially affect 5’ and 3’ S-region 
broken end structure. To test the potential roles of S regions in 
orientation-specific CSR, we used a Cas9/gRNA approach to invert Su 
on the productive allele of CH12F3 B cells, which modestly reduced 
CSR (Extended Data Fig. 3f-h). We then assayed CH12F3 cells in 
which Sa was replaced with an I-SceI site and Su was in a normal or 
inverted orientation. These assays revealed that joins of I-Scel-generated 
3'-broken ends at the Sa locale to Sit DSBs were similarly biased 
for deletional junctions independent of Sp orientation (Fig. 2a-c). 


Consistent with low-level trans CSR’®, HTGTS libraries from activated 
ASu?*'/ASy1°*"B cells contained numerous junctions from ASy1?*! 3'- 
broken ends across the trans Su; which, in contrast to cis ASy1°*! 3'- 
broken end Sp junctions, occurred in + and — orientations at a similar 
frequency (Fig. 2d). Likewise, bait 3’-broken ends from the ASy17*" Igh 
allele identified approximately equal numbers of (+) versus (—) junc- 
tions to AID off-target DSBs in I/4ra on chromosome 7 (Extended Data 
Fig. 2e). Finally, translocations between bait 5’ I-Scel DSB broken ends in 
c-myc’ and prey AID-initiated Sp and Se broken ends in CSR-activated B 
cells lacked orientation bias (Fig. 2e). We conclude that orientation- 
dependent CSR joining does not require orientation-associated features 
of Spt sequence, transcription, or transcripts. Moreover, AID-initiated 
DSBs per se are not sufficient to promote orientation specificity, as 
demonstrated by orientation-independence of DSB joining to them in 
trans. Thus, beyond S-region sequences and/or high frequency AID- 
initiated DSBs within them, aspects of Igh locus organization in cis must 
play a critical role in promoting orientation-dependent CSR joining. 
We tested whether joining between two sets of endogenous AID- 
initiated S-region DSBs is orientation-dependent. Use of core S-region 
DSBs as HTGTS bait is confounded by their highly repetitive nature. 
Therefore, we used as bait a 150-base-pair (bp) sequence at the 5’ end 
of Su (5'Su), which retains 14 of approximately 500 Sj. AID-target 
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Figure 2 | S regions are not sufficient to 
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motifs (Fig. 3a, left panel). HTGTS of anti-CD40/IL4-stimulated 
B cells with the 5’Sut broken end primer revealed break-site junctions, 
as well as Syl and Se junctions (Fig. 3b, c). Consistent with AID- 
initiation, bait junctions were enriched at AID-targets within the 
5'Su bait (Fig. 3a, right panel). 5’Sy broken end junctions spread 
broadly over prey S regions, with up to 95% in a deletional orientation 
(Fig. 3c). For comparison, we tested a 150-bp 5’ remnant of Spt (rSp; 
Extended Data Fig. 4a, left panel), retained when the rest of Sp was 
deleted’’. B cells homozygous for rSp have reduced IgG1 CSR but 
nearly normal IgE CSR’*. HTGTS with either 5’ rSu or 3’ rSp broken 
end primers of anti-CD40/IL4- and lipopolysaccharide/anti-IgD- 
dextran-stimulated B cells, respectively, revealed junctions to Sy1 
and Se and to Sy3, Sy2b, and Sy2a (Extended Data Fig. 4). 5’ rSu 
broken end junctions spread over target S regions, with >90% in a 
deletional orientation (Extended Data Fig. 4b, f); while >90% of 3’ rSu 
broken end junctions were in the complementary excision circle ori- 
entation (Extended Data Fig. 4c, g). Within the bait rSu, junctions 
again were enriched at AID targets (Extended Data Fig. 4a). 
Consistent with IgH class-switching patterns, rsu HTGTS junctions 
occurred more frequently to Se than those from the 5’Su bait in the 
context of full-length Sj: (Extended Data Fig. 4b). Analyses of rSp- 
mutant CH12F3 cells gave similar results (Extended Data Fig. 5a-c). 
Thus, AID-initiated Sj. DSB joining to all downstream acceptor S 
regions is strongly biased towards the deletional orientation. 

CSR DSBs generate a DSB response (DSBR) in which ATM 
activates histone H2AX and 53BP1 in chromatin flanking DSBs, 
thereby contributing to end-joining’**. ATM or H2AX deficiency 
moderately reduces CSR (Extended Data Fig. 6a)”!°. However, 
53BP1 deficiency causes a more drastic reduction (Extended Data 
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Fig. 6a), suggesting specialized CSR roles 
S-region synapsis or protecting S-region DSBs from resection 
To elucidate influences on orientation-specific CSR, we employed 
HTGTS to assay joining of AID-initiated 5’Su broken ends to AID- 
initiated Sy1 and Se DSBs in anti-CD40/IL4-activated ATM-, H2AX-, 
and 53BP1-deficient B cells, as well as in B cells deficient for Rif-1, a 
53BP1-associated factor that mediates resection blocking**’’. ATM-, 
H2AX-, and Rifl-deficient B cells had reduced Sy1 and Se junctions 
compared to wild type; 53BP1-deficient B cells hada greater reduction, 
with most localizing to the break-site region (Fig. 3d, e and Extended 
Data Fig. 6b-d). Most break-site junctions were resections, which were 
longest (up to about 6 kb) for 53BP1 deficiency (Extended Data Fig. 6e, 
f; see extended discussion in Supplementary Information for Extended 
Data Fig. 6f). Compared to wild type, bait 5’Su junctions to Sy1 and Se 
DSBs in different DSBR-deficient backgrounds had varying decreases 
in orientation specificity, with H2AX deficiency having the smallest 
and 53BP1 deficiency the largest (Figs 3d, e and 4a; Extended Data Fig. 
6c, d and Extended Data Table 1a, b). Indeed, residual junctions of 
5'Su to Sy1 and Se locales in 53BP1-deficient B cells showed relatively 
normalized inversion:deletion ratios (Fig. 4a), a finding confirmed by 
DC-PCR (Extended Data Fig. 1b). Finally, 53BP1-deficiency did not 
impact joining orientation of 5’Su and 3’Sy1 I-Scel-generated broken 
ends in AID-deficient Igh’°* B cells (Extended Data Fig. 1g). 

Owing to the potential difficulty in measuring relative resection of 
recurrent re-joins at or near the break-site, we focused on prey 
S-region broken end resections (Extended Data Figs 1 and 6; see 
extended discussion in Supplementary Information for Extended 
Data Figs 1d, e and 6f). Because S regions are long and AID-initiated 
DSB locations within them are diverse, we estimated relative resection 
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Figure 3 | Orientation-biased joining of AID- 
initiated endogenous S-region breaks. a, Left, 
150-bp 5’Su sequence used as HTGTS bait. Red 
arrow denotes 5’Sj1 primer. Red and blue vertical 


lines indicate AGCT or other AID-targeting motifs, 
respectively. Right, distribution and frequency 

of 5’Su break points in junctions to downstream 
S regions recovered from anti-CD40/IL4- 
stimulated wild-type B cells. Asterisks indicate 
positions of AGCT or other RGYW motifs. 
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by quantifying bait broken end to prey broken end junctions down- 
stream of S-region positions where the incidence of wild-type junctions 
decreases to background (Fig. 3b-e). Based on this ‘long’ S-region 
resection assay, ATM- and H2AX-deficient cells had modest resection 
increases, Rifl-deficient cells slightly greater increases, and 53BP1- 
deficient B cells far greater increases that were also apparent as a 
‘flattening’ of Sy1 and Se junction profiles relative to other backgrounds 
(Figs 3c—-e and 4b; Extended Data Fig. 6c, d and Extended Data Table 
1c, d). HTGTS assays of rSu bait broken end junctions to Sy1 and Se 
(Extended Data Fig. 7) and I-Scel-generated 3’ ASy1?*’ broken end 
bait junctions to Su and Se (Extended Data Fig. 8) gave similar results. 
In H2AX- or Rifl-deficient B cells, a large fraction of 5’Sj1 junctions 
were within S regions, with the main difference from wild type being a 
subset of junctions extending beyond S regions, probably reflecting 
extensive resection of broken ends not rapidly fused (Fig. 4b and 
Extended Data Fig. 6c, d). Treatment of 53BP1-deficient activated B 
cells with ATM kinase inhibitor substantially diminished very long 
S-region resections, but did not restore orientation-dependent joining 
(Fig. 4a, b; Extended Data Table 1 and Extended Data Fig. 9a-f). 
This finding may reflect shorter resections in inhibitor-treated 
53BP1-deficient versus ATM-deficient B cells that are not revealed 


by our long resection assay. Another possibility would involve a putat- 
ive specialized role for 53BP1 in stabilizing synapsed S regions”’. 

We demonstrate that CSR is mechanistically programmed to occur 
in a productive deletional orientation. Based on our findings, we pro- 
pose a working model for orientation-specific CSR, in which a key 
component is the organization of S regions within topologically- 
associated domains (TADs) that promote their frequent S-region 
synapsis”'*!° via Langevin motion”’*”* (Fig. 4c). Within such TADs, 
we implicate additional Igh-specific organizational features, not yet 
fully elucidated, in playing a fundamental role in mediating synapsis 
in an orientation that promotes deletional joining (Fig. 4c). We find 
that functions of such organizational features are complemented 
by S regions, potentially associated with their ability to promote 
AID-initiated DSBs, multiple frequent DSBs, or both. Our studies 
also implicate DSBR factors in enforcing this mechanism (Fig. 4d). 
The broader DSBR probably contributes by tethering un-synapsed 
S-region DSBs for efficient re-joining, keeping them from separating 
into chromosomal breaks that could frequently translocate with 
orientation independence to S-region broken ends within the 
TAD”; this function would also allow subsequent AID-initiated 
breakage and joining to a synapsed S region (Fig. 4c). DSBR factors 
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Figure 4 | Mechanisitic roles of Igh organization 
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also prevent long end-resections that could cause S-region broken ends 
to linger in resection complexes, preventing synapsis with other 
S-region broken ends and/or diminishing ability to be joined by clas- 
sical non-homologous end joining (Fig. 4d). Different DSBR factors 
have differential impact in tethering versus resection inhibition and, 
thus, may impact orientation dependence via different routes. For 
example, ATM deficiency inhibits resection by impairing CtIP 
activation”, but promotes resection via other nucleases by impairing 
inhibitory activities of H2AX, 53BP1 and, indirectly, Rifl”°”’ (Fig. 4d). 
53BP1-deficiency is unique in that it both impairs tethering for rejoin- 
ing and activates resection of un-joined ends by failure to activate Rifl, 
leading to extreme resections and the greatest impairment of CSR and 
orientation-dependent joining (Fig. 4d). As common and unique 
impacts of 53BP1 deficiency markedly affect both donor and acceptor 
S regions, they would be multiplicative and, thereby, explain the pro- 
found impact of 53BP1-deficiency on CSR. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Mice. Igh" °° AID ~~ (ref. 11), ASp?""/ASy1?™! chimaera”?, ASy1?*"S“! (ref. 12), 
Su (ref. 17), c-myc? See! (ref. 4), ATM’ ’~ (ref. 30), H2AX~’~ (ref. 31), 
53BP1 ‘~ (ref.32),and Riff/'CD19™ (ref. 33) lines have been reported previously. 
Mouse work was performed under protocols approved by the Boston Children’s 
Hospital and the Rockefeller University Institutional Animal Care and Use 
Committees. 

Plasmids and oligonucleotides. Oligonucleotides for gRNAs for CRISPR/Cas9- 
mediated targeting of various Igh regions were cloned into pX330 vector (Addgene 
plasmid ID 42230) as described**. The target sequences of Cas9 constructs are 
listed in the DNA oligonucleotides table in the Supplementary Information. 
Exchange vector (pLH28) with heterologous loxP sites was obtained from K. 
Yu. A 200-bp GFP-derived sequence was amplified and ligated to an I-Scel recog- 
nition sequence and subsequently introduced into the pLH28 vector to make the 
pLH-1X I-Scel exchange vector. To obtain the I-Scel expression plasmid for 
transducing CH12 cell lines, I-Scel-IRES-GFP fragment was shuttled from a retro- 
viral construct (pMX-I-Scel-IRES-GFP) into pCDNA3.0 (Invitrogen) vector. 
B-cell culture, transduction and FACS analysis. Mature splenic B cells isolated 
using a CD43-negative selection kit (MACS) were cultured in lymphocyte med- 
ium R15 (RPMI1640, 15% FBS, L-glutamate, 1X penicillin and streptomycin). 
B-cell stimulation was performed with anti-CD40 (1 pgml”', eBioscience) plus 
IL4 (20ng ml’, PeproTech) or LPS (25ng ml’, Sigma) plus anti-IgD-dextran 
(3ngml ', a gift from R. Casellas) for 96h. Infection with I-Scel expression or 
control retrovirus was carried out at day 1 post-stimulation by the standard spin- 
ning method with the presence of 41g ml polybrene as previously described". 
Efficiency of retrovirus infection and switching levels were evaluated by flow 
cytometry as previously described’*. Where indicated, ATM inhibitor KU-55933 
(Tocris) was added to stimulated cells at day 1 post-stimulation to a final concen- 
tration of 101M and was maintained during the course of the experiment until 
collection of the cells for FACS and HTGTS libraries. 

Cell lines and nucleofection. CH12F3 cell line stimulation to IgA was performed 
as described’. CH12F3 cells with recombinase-mediated cassette exchange 
(RMCE) in place of the endogenous Sa region, referred to as 1F7 cells** were 
maintained at 37 °C, 5% CO, and cultured in RPMI media with 10% FCS, 0.5% 
penicillin/streptomycin, 50 .M B-mercaptoethanol. Exchange vector with hetero- 
logous loxP sites containing 1X I-Scel site embedded in 200 bp of GFP-derived 
sequence was cloned. RMCE was performed as previously described’*. Exchanged 
ASa'*" clones were verified by PCR, Sanger sequencing and Southern blotting. 
ASo*" cells were then stimulated with anti-CD40, IL4 and TGE-B for 15h followed 
by nucleofection with pcDNA-I-Scel-IRES-GFP expression vector using 4D- 
nucleofector X (Lonza, solution SF, protocol CA-137) and re-plated in stimu- 
lation-conditioned media. On day 3 post-stimulation cells, were collected and 
gDNA was isolated for HTGTS library preparation. Cells were not tested for 
mycoplasma contamination. 

To obtain CH12F3 (productive allele Su(INV), non-productive allele ASpi-Sx) 
cells, wild-type CH12F3 cells were first nucleofected using the 4D-nucleofector X 
(Lonza, solution SF protocol CA-137) with the gRNA vectors to excise the 
sequences between J},;4 intron and ~130bp downstream of Co polyadenylation 
on the non-coding allele that has already switched to So. Single-cell subclones were 
seeded into 96-well plates 12h post-nucleofection, and the resulting clones were 
screened by PCR and Southern blot. One confirmed positive clone was further 
modified by gRNA vectors targeted at the 5’Sy_1 and 3’Sp regions to invert the Su 
(~4kb) sequence. Initial screening for positive clones was performed by PCR, 
followed by Southern blotting and Sanger sequencing for the inversion junction. 
The resultant cells were stimulated with anti-CD40, IL4 and TGF-B, IgA CSR was 
measured by FACS on days 2 and 3 post-stimulation. 4Sx’*" Sy(INV) cells were 
obtained by targeting the aforementioned 1 x I-Scel RMCE-positive cells with 
gRNA targeting 5’S_2 and 3’ Sw for inverting the St sequence same as above. The 
resultant positive clones were verified by PCR, Southern blotting and Sanger 
sequencing for the inversion junction. To make rSu-CH12F3 cells, the aforemen- 
tioned CH12F3 (non-productive allele ASj1-Sx) cells were used to further truncate 
Spt sequences on the coding allele with gRNA targeting 5’Su_2 and 3’Su. Single- 
cell deletion subclones were screened and confirmed by PCR and Southern blot. 
The resultant rSu-CH12F3 cells were stimulated with anti-CD40, IL4 and TGF-B 
and harvested on days 2 and 3 for gDNA isolation for HTGTS library preparation. 
DC-PCR. The DC-PCR assay was performed as described previously”. In brief, 
genomic DNA was isolated and subsequently purified by phenol chloroform 
extraction from day 4 anti-CD40/IL4 stimulated B cells. Five micrograms of 
genomic DNA was digested overnight with 20 U of EcoRI (Roche). Ligations were 
performed under diluted conditions to promote circularization. Digested DNA 


was ligated overnight at 16°C with a concentration of 1.8-9ng yl’ in a total 
volume of 100 pil per reaction. Three to four ligation reactions were pooled, col- 
umn purified, concentrated and serially diluted at a 1:5 ratio. PCR was then 
performed in 50 pl per reaction using 2.5 U Taq (Qiagen) with serially diluted 
DNA starting from ~50-150 ng. Primers were designed to amplify the Sp-Sy1 
rearrangements that occur during CSR to IgG] in direct chromosomal joining of 
Su-Sy1 with excision of circular DNA or inversion of sequences between broken 
ends of Sp and Sy1. As a control for EcoRI digestion and circularization of input 
DNA, amplification of an EcoRI fragment of nicotinic acetylcholine receptor B 
subunit gene (CHRNB1) was performed, which, after EcoRI digestion and circu- 
larization, generates a 753-bp DC-PCR product. To quantify the amount of direct 
or inversion joins amplified by PCR, DC-PCR products of direct or inversion joins 
were cloned into the pcR2.1 Topo TA vector. Precise plasmid concentrations were 
determined and a standard curve was generated ranging from 4 to 10,000 copies 
per reaction. After running on 1% agarose gel, PCR fragments were transferred to 
nitrocellulose membrane and hybridized to a 3'Sy1 probe according to standard 
Southern blotting procedures. Primers for direct joining PCR: forward, 5'-CAT 
GAGAGCTGGAGCTAGTATGAAGGTG -3’; reverse, 5’-ACTGACTGACTGA 
GTGTCCTCTCAAC-3’. Primers for inversional joining PCR: forward, 5'-CAG 
TCACAGAGAAACTGATCCAGGTGAG -3’; reverse, 5'- CCATAGCAGTTGG 
TCAATCCTTGTCTCC-3’. Primers for control CHRNBI DC-PCR**: forward, 
5'-GCGCCATCGATGGACTGCTGTGGGTTTCACCCAG-3’; reverse, 5’-GGC 
CGGTCGACAGGCGCGCACTGACACCACTAAG-3’. Oligonucleotide probe 
for the detection of both deletional and inversional CSR joining products: Sy1- 
CCTGGGTAGGTTACAGGTCAAGGCT. 

High-throughput genome-wide translocation sequencing (HTGTS). HTGTS 
libraries were generated by emulsion-mediated PCR (EM-PCR) and linear- 
amplification-mediated PCR (LAM-PCR) methods as described in ref. 5. In brief, 
sonicated (Bioruptor, Diagenode) gDNA was subjected to LAM-PCR using 1 U 
Taq polymerase (Qiagen) per reaction with a single biotinylated primer for 
50 cycles of 94°C for 180s; 94 °C for 30 s; 58 °C for 30 s; 72 °C for 90s. One more 
unit of Taq polymerase was added to the reaction mixture to execute PCR for an 
additional 50 cycles. Biotinylated DNA fragments were captured with Dynabeads 
MyOne streptavidin C1 beads (Invitrogen) at room temperature for 1 h, followed 
by on-bead ligation at 25 °C for 2h with bridge adapters in the presence of 15% 
PEG-8000 (Sigma) and 1 mM hexammine cobalt chloride (Sigma). After washing 
beads with B&W buffer as described by the manufacturer, ligated products were 
subjected to 15 cycles of on-bead PCR with Phusion polymerase (Fisher), locus- 
specific and adaptor primer followed by blocking digestion with appropriate 
restriction enzymes to remove uncut germline gDNA. A third round of tagging 
PCR to add Illumina Miseq-compatible adapters at 5’ and 3’ ends of the second- 
round PCR product was carried out for another 10 cycles with Phusion polymer- 
ase. PCR products were size-fractionated for DNA fragments between 300- 
1000 bp on a 1% agrose gel, column purified (Qiagen) before loading onto 
Illumina Miseq machine for sequencing. 

Data analyses. Data analysis of MiSeq sequencing reads has been described in ref. 5. 
In brief, de-multiplexing for the MiSeq reads was performed using the fastq-multx 
tool from ea-utils (https://code.google.com/p/ea-utils/) and adaptor sequence trim- 
ming was performed using the SeqPrep utility (https://github.com/jstjohn/ 
SeqPrep). Reads were mapped using Bowtie2 (http://bowtie-bio.sourcefor- 
ge.net/bowtie2/manual.shtml) to either mm9 (for libraries generated with Rifl 
knockout cells and CH12F3-derived cells) or modified mm9 reference genome 
(for all other genotypes) containing the 176-kb Igh constant region of 129S 
genome, in which the region between chr12:114493849-114665808 of mm9 
was replaced with DNA sequence ranging from 1416975 to 1593283 on the 
1298 Igh reference sequences AJ851868.3. In cases where necessary, for instance 
when aligning reads to the Su’*! locus on the Igh’** allele and other circum- 
stances, we further modified the custom 129S_IgHC genome to insert the cassette 
sequences to accurately reflect the changes of genomic information before align- 
ing MiSeq reads by Bowtie2. CH12F3 clone was derived from CH12.LX lym- 
phoma cell line*’. CH12.LX cells were subcloned from the original CH12 
lymphoma cell line**, which originated from a C57BL/10 mouse substrain double 
congenic for H-2* H-4? (ref. 39). C57BL/10 and C57BL/6 are both substrains of 
C57BL and thus we use BL/6 (mm9) as reference genome when running our 
HTGTS data analyses pipeline on libraries made with CH12F3 cells. To reflect 
additional genome modifications (for example, Sj(INV) shown in Fig. 2b), the 
mm9 genome sequence was modified accordingly. 

A best-path searching algorithm (based on YAHA read aligner and break point 
detector*’) was used to select optimal sequence alignments from Bowtie2-reported 
top alignments with an alignment score above 50, which represents a perfect 25- 
nucleotide (nt) local alignment. To avoid detecting possible mis-priming events, 
we set a bait alignment threshold of at least ten perfectly aligned nucleotides 
extending from the end of cloning primer. Aligned reads were subsequently 
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filtered on following criteria: (1) reads must include both a bait alignment and a 
prey alignment; and (2) the bait alignment cannot extend more than 10 nt beyond 
the targeted site. For reads mapped to the repetitive low-mappability regions, 
multiple competing alignments with identical or similar scores exist and the 
coordinates for best alignment are randomly chosen among the competing ones. 
For junctions mapped to each individual repetitive S region, there are no compet- 
ing alignments from outside of that region as shown by simulation (see details 
below), although the exact junction coordinate within the region could not be 
identified. We also applied filter to remove duplicates (referred to as ‘de-dup’ 
hereafter) wherein the coordinates of the end of the bait alignment were compared 
to the start of the prey alignment across all reads. A read is marked as a duplicate if 
it has bait and prey alignment coordinates within 2 nt of another read’s bait and 
prey alignments. To plot all the S-region junctions, we took the ones filtered by a 
mappability filter but unequivocally mapped to S regions and removed the repeats 
through the de-dup program mentioned above, before combining with ‘good’ 
reads passing both the mappability and de-dup filters. A grey box over S regions 
(for example, Spt and Sy1) in the figures is used to denote the repetitive regions in 
these S sequences wherein the randomly assigned mappability-filtered reads were 
included. Additionally, we applied post-filtering stringencies to remove junctions 
mapped to simple sequence repeats, telomere repeats and reads with excessive 
microhomology >20 nt and insertions >30 nt before further analysis. In the end, 
the combined and cleaned junctions were then plotted genome-wide or onto 
desired S regions by using the PlotRegion tool (for details see section below). 
Scripts and details of pipeline parameters are available upon request. 

Pipeline simulation for S-region mappability. Results of the S-region mapp- 
ability simulations are available in the Supplementary Information. 
S-region junction plotting. As described above, junctions filtered by the mapp- 
ability filter are retrieved and de-duped before combining with normal junctions. 
To plot junction coordinates onto individual S regions or the entire Igh constant 
region, combined junctions are binned using the PlotRegion tool into 100 bins (bin 
size varies depending on the length of target region that libraries are plotted to) on 
the basis of the junction coordinates and orientation of joining. The bincount file 
(histogram information for junction distribution in both joining orientation) 
generated by the PlotRegion tool is used to calculate the percentage of junctions 
in each bin in either + or — orientation of the total number of junctions mapped to 
the region of interest. The results were then plotted as linear graphs by the Prism 
software. Note that the scale on top of each graph indicates the size of 
region plotted and is fixed as 1/10 of the size of the plotted region, thus is always 
10X bin size. 
Calculation of joining orientation bias and acceptor S-region resection. For 
simplicity, joining from 5’Sp. to downstream Sy1 and Se breaks are used for the 
explanation of orientation bias and resection of acceptor S-region DSBs. Junctions 
mapped to Sy1 and Se can be divided into six regions (denoted by a-f) in either + 
or — orientation: 

a|b\c(—) 

dje|f(+) 
Junctions encompassing core Sy1/Sé are illustrated as b and e regions for — and 
+ junctions respectively, c region (deletional joining, — orientation) or d region 
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(inversional joining, + orientation) represent joining of bait DSB broken ends to 
resected acceptor Sy1/Se DSBs. Junctions falling into regions a or f represent 
joining to non-AID-generated de novo breaks of unknown source and are often 
very small in number, and thus were omitted from the calculation of both 
resection and orientation bias. Since in most genetic backgrounds other than 
53BP1 ‘~ inversion joins are much rarer than deletions, the level of resection 
junctions into the d region fluctuates much more than resection junctions into 
the c region. We thus chose the c region for calculating resection in all genotypes 
as follows: 


c 
resection rate = —— x 100 


b+c 


The degree of orientation bias, for the purpose of positively correlating with the 
level of resection, is calculated as the ratio of inversional joins versus deletional 
joins as below: 


; . d+e 
bias ratio = —— x 100 
b+c 


To make a bar graph for comparison of orientation bias degree and resection 
levels in the CSR junctions obtained from libraries with different genetic back- 
grounds, individual replicate HTGTS libraries were first size-normalized to the 
one with smallest junction number in the region of interest among the replicates; 
resection and bias ratio values from individual experiments were calculated sepa- 
rately and averages were used for statistical analysis with unpaired two-tailed 
t-tests. Experiments for each genotype were performed at least three times. 
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Extended Data Figure 1 | Deletional CSR in in vitro activated B cells by 
DC-PCR; I-Scel DSBs within the Igh constant region locus in activated B 
cells join with orientation-independence. a, Schematic representation of 
DC-PCR assay. b, DC-PCR results from anti-CD40/IL4-activated wild-type 
and 53BP1 ‘~ B cells. c, Schematic representation of the HTGTS method. 

d, e, HTGTS libraries analyses of anti-CD40/IL4-stimulated Igh" ek Bcells with 
3'-broken end (d, red arrow, n = 3) or 5'-broken end (e, blue arrow, n = 3) 


primers. BE, broken end. f, HTGTS libraries with 5'-broken end primer (blue 
arrow, n = 3) from ASu?*"/ASy1?™" B cells stimulated with anti-CD40/IL4. 
g, Bar graph depicting deletion:inversion and excision-circle:inversion ratios 
between two I-Scel sites and between I-Scel and S region in wild-type versus 
53BP1 ‘~ backgrounds. For detailed legends and further discussion, refer 

to the Supplementary Information. 
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Extended Data Figure 2 | Genome-wide translocation junctions lack 
orientation bias; statistical analyses for experimental replicates orientation- 
biased joining between I-Scel break in place of Sa and AID-initiated Sp 
breaks in CH12F3 cells. a, b, Circos plots for translocation junctions across the 
whole genome from 3'-broken end (a, n = 4) or 5’-broken end (b, n = 3) 
HTGTS with anti-CD40/IL4 stimulated ASw*/A Syr*t B cells. c, d, Bar 
graphs depicting genome-wide percentage of junctions from pooled 3’- and 
5'-broken end libraries plotted separately in — or + orientations. Error bars are 
s.d. e, Joining from ASy1?*" 3'-broken end to AID off-target DSBs in Il4ra 
gene on chromosome 7. f, Bar graph showing the number of junctions 
(average + s.d.) recovered from Igh’°** AID ’~ 3'-broken end HTGTS 
libraries (n = 3) at the break site and the upstream Sy’”' prey break as a 
percentage of the total number of junctions mapped to the 200 kb Igh constant 
region. Right panel shows the percentage of junctions mapping at Sy’*" 
(average + s.d.) over the total Igh junctions that are mapped in the deletion 
(Del) or inversional (Inv) orientation. The numbers above the bar graph 
(average + s.d.) denote the ratio of deletional to inversional junctions. 

g, Percentage of junctions (average + s.d.) recovered from the Igh’ SAID /— 
5'-broken end HTGTS libraries (n = 3). h, Percentage of junctions 

(average + s.d.) recovered from the ASu?*"/ASy1?*" 3' -broken end libraries 


(n = 4). i, Percentage of junctions (average + s.d.) recovered from the 
ASp?*"/ASy1?*! 5'-broken end libraries (n = 3). j, Percentage of junctions 
(average = s.d.) recovered from the wild-type 4Sy1°*" 3’ -broken end libraries 
(n = 3). k, Percentage of junctions (average + s.d.) recovered from the 4 Salt 
CH12F3 3’-broken end libraries (n = 3) and ASo!*! Su(INV) CH12F3 cells 
3’-broken end libraries (n = 3). 1, Bar graphs depicting percentage of 

trans junctions mapping to Su in — and + orientations from libraries of 
ASy?”/ASy1?™" B cells (n = 3) cloning from ASy1?*" 3'-broken ends. m, Bar 
graphs depicting percentage of trans junctions mapping to Su in — and 

+ orientations and to Se in — and + orientations from libraries of c-myc?" 
5'-broken ends (n = 3). n, 0, HTGTS library analyses of ASa?*! CH12F3 cells 
stimulated with anti-CD40, IL4 and TGF and nucleofected with I-Scel 
expression plasmid. Cells were harvested on day 3 post-stimulation for 
3'-broken ends (n, n = 6) and 5'-broken ends (0, n = 6) libraries. p, 3’- and 
5'-broken end libraries are normalized with ‘symmetric junctions’ (see 
Supplementary Information). q, Bar graph showing percentage of junctions 
from ASo!*! CH12F3 cells (n = 6) from 3’- and 5’-broken end primers. For 
detailed legends and further discussion refer to the Supplementary 
Information. 
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Extended Data Figure 3 | Joining between I-Scel break at Su and 
AID-initiated S-region breaks in lipopolysaccharide (LPS)-activated 

ASC *"/ASy1?*" B cells and clustered I-Scel breaks in 45¢7*1/4Sy17°"" B 
cells in place of Sy1; inverted Sp in CH12F3 cells support robust IgA CSR. 
a, Diagram of Igh locus organization in ASp°*"/ASy1°™" B cells highlighting 
AID-initiated breaks in Sy3, Sy2b and Sy2a regions upon LPS stimulation and 
potential outcomes of CSR in the form of deletion (red, —) and inversional 
joining (blue, +). b, Plots showing enlarged distribution of pooled prey 
junctions in a 20-kb region flanking Sy3 and Sy2b and Sy2a from HTGTS 
libraries of ASu?*"/ASy1?*" B cells (n = 3) stimulated with LPS and anti-IgD- 
dextran and infected with I-Scel-expressing retrovirus. c, Bar graph from 
three independent ASp?*"/ASy 1?" 5! -broken end libraries showing the 
percentage of junctions mapped at different S regions. d, Diagram of Igh locus 


organization in ASy°”" ASy17**" B cells highlighting joining outcomes of 
I-Sce1-mediated bait DSBs at 4Sy°* to clustered I-Scel DSBs at 4Sy17°* in the 
form of deletion (red, —) and inversional joining (blue, +). e, Pooled prey 
junctions from independent AS ie" ASy1 8*1B cell libraries (n = 2, left panel, 
emulsion-mediated PCR; n = 2, right panel, linear-amplification-mediated 
HTGTS). f, Southern blot for Su inversion on the productive allele of 
CH12F3 cells with non-productive allele deleted. g, IgA CSR on day 3 for 
CH12F3 (non-productive ASp-Sa) cells stimulated with anti-CD40, IL4 and 
TGF. h, IgA CSR on CH12F3 (productive allele Su(INV), non-productive 
allele ASji-S) cells stimulated with anti-CD40, IL4 and TGFB. Two 
independent clones of CH12F3 (Su(INV), non-productive ASu-Sa). For 
detailed legends and further discussion refer to the Supplementary 
Information. 
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Extended Data Figure 4 | Orientation-biased joining between AID- 
initiated rSp and downstream AID-initiated S-region breaks in anti-CD40/ 
IL4-activated and LPS-activated Sp-truncated B cells. a, 150-bp rSp sequence 
used as HTGTS bait with red arrow indicating rSu 5’-broken end HTGTS 
primer; red and blue vertical lines indicate canonical AGCT or other RGYW 
AID-targeting motifs, respectively. Distribution of rSp break points in 
junctions to downstream S regions recovered from anti-CD40/IL4-stimulated 
rSu B cells. b, HTGTS analyses of anti-CD40/IL4-activated rSu B cells, 5’ rSu 
AID-initiated broken end (red primer, m = 3) junctions to AID-initiated 
DSBs in Sy1 and Se which includes deletion (— orientation, red) or inversions 
(+ orientation, blue). c, HTGTS analyses of anti-CD40/IL4-activated rSu 

B cells, 3’ rSp, AID-initiated broken end (blue primer, n = 3) junctions to AID- 
initiated DSBs in Sy1 and Se which includes excision circle (+ orientation, 
blue) or inversions (— orientation, red). d, Bar graph showing the percentage of 
junctions (average + s.d.) from anti-CD40/IL4-activated rSp 5’-broken end 


libraries mapped to Sy1 and Se. e, Bar graph showing the percentage of 
junctions (average + s.d.) from anti-CD40/IL4-activated rSp. 3'-broken end 
libraries mapped to Sy1 and Se. f, HTGTS analyses of LPS-activated rSu B cells, 
5’ rSu (red primer, n = 3) AID-initiated broken end junctions to AID- 
initiated DSBs in Sy3, Sy2b and Sy2a which include deletional joining 

(— orientation, red) or inversions (+ orientation, blue). g, HTGTS analyses of 
LPS-activated rSp B cells, 3’ rSu (blue primer, n = 3) AID-initiated broken 
end junctions to AID-initiated DSBs of above LPS-stimulated cells in Sy3, Sy2b 
and Sy2a which include excision circle (+ orientation, blue) or inversions 

(— orientation, red). h, i, Percentage of junction distribution at Sy3, Sy2b and 
Sy2a in both orientations from both 5’-broken end libraries (h) and 3'’-broken 
end libraries (i) are shown as average + s.d. from three independent 
experiments. For detailed legends and further discussion refer to the 
Supplementary Information. 
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Extended Data Figure 5 | Orientation-biased joining between rSp and AID- 
induced Sa DSBs in CSR-activated CH12F3 cells. a, Diagram outlining 
potential junction outcomes from 5’ rSu (red primer) or 3’ rSu (blue primer) 
AID-initiated broken end junctions to AID-initiated DSBs in Sx upon anti- 
CD40, IL4 and TGFB stimulation of 4Sj CH12F3 cells. b, c, Top panel 
shows HTGTS libraries analyses of day 2 (b) and day 3 (c) stimulated CH12F3 
(non-productive allele ASu-Sx, productive allele 4S) cells cloning from 
5'-broken end rSu (red primer, n = 3), whereas lower panel shows HTGTS 


libraries cloning from 3’-broken end rSp (blue primer, n = 3). d, Bar graph 
shows percentage of junctions (average + s.d.) for 5’-broken end and 
3'-broken end libraries indicated in b and c. e, Bar graph with percentage 

of junctions (average + s.d.) from rSp libraries mapped to prey So in the 
deletion (DEL) or inversion (INV) for 5’-broken end libraries and in excision 
circle (EC) or inversion orientation for 3’-broken end libraries. For detailed 
legends and further discussion refer to the Supplementary Information. 
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Extended Data Figure 6 | Level of junctions to downstream S regions in 
wild-type and DSBR-deficient 5’Sp HTGTS libraries correlate with CSR 
levels; 5’S break site undergoes variable degrees of resection from 
stimulated DSBR-deficient B cells. a, Table showing IgG1 and IgE CSR levels 
of splenic B cells from various genotypes (with number of replicates indicated) 
activated in vitro with anti-CD40 and IL4. FACS was performed on day 4 
and values indicate average + s.d. WT, wild type. b, Left panel shows bar graph 
for percentage of junctions (average + s.d.) recovered from wild-type 5’Su 
5'-broken end libraries mapped to Sp, Sy1 and Se over the total number 

of junctions identified from the 200-kb Igh constant region. Remaining 
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panels show the similar results from different DSBR-deficient backgrounds 
using the same 5’-broken end primer. ¢, d, 5'Su5'-broken end HTGTS libraries 
analyses from H2AX ‘~ and RIFE“"CD19™ B cells are shown respectively. 
e, Diagram of potential junction outcomes from 5’S1 AID-initiated 5’-broken 
end junctions to AID-initiated DSBs in Sy1 and Se. f, Data from HTGTS 
libraries mapped to the 20-kb region flanking 5’Su break site from B cells 
stimulated with anti-CD40/IL4 in wild-type and DSBR-deficient backgrounds. 
For detailed legends and further discussion refer to the Supplementary 
Information. 
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Extended Data Figure 7 | Orientation-biased joining between rSp and 
AID-induced Sy and Se DSBs in wild-type, ATM-deficient, and 
53BP1-deficient B cells. a, Diagram of potential junction outcomes from 

5’ rSp AID-initiated broken end junctions to AID-initiated DSBs in Sy1 and Se 
as described earlier. b-d, Linear plots of pooled junctions across the 200-kb 
Igh constant region (first panel), the 20-kb region flanking rSu break site 
(second panel), the 20-kb region flanking downstream Sy1 (third panel) and 
Se (last panel) from three independent wild-type (b), ATM- (c), or 53BP1- 
(d) deficient 5’-broken end rSy libraries. e, Bar graphs showing 
invertion:deletion (INV/DEL) bias ratios of HTGTS Sy1 (left panel) and Se 
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(right panel) junctions in different genotypes, showing average + s.d. from 
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three ees libraries for each genotype. P values calculated by unpaired 
two-tailed f-tests. f, Bar graphs showing percentage of long resection of Sy1 
(left) and Se (right) junctions in different genotypes as average + s.d. n.s., not 
significant. P values calculated by unpaired two-tailed t-tests. g, Bar graphs 
showing the number of junctions (average + s.d.) recovered from above 
experiments from 5'-broken end HTGTS libraries for the indicated genotypes 
(n = 3 for each) at the break site rSu. and downstream Sy1 and Se regions as 
a percentage of the total number of junctions mapped to the 200-kb IgH 
constant region. For detailed legends and further discussion refer to the 
Supplementary Information. 
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Extended Data Figure 8 | Orientation-biased joining of I-Scel DSBs at Sy1 
to AID-induced S-region breaks in various DSBR-deficient backgrounds. 
a, Schematic illustration of the ASy1?*/ allele and joining outcomes from 
3'-broken end (red arrow) to AID-initiated upstream Spt and downstream Se 
DSBs. b, Linear distribution of junctions between 4 Sy1?*! 3'-broken end to 
AID-induced Sy/Se region DSBs in anti-CD40/IL4-stimulated wild-type 
(b,n =4), ATM“ (c,n=3), H2AX “ (d,n=3) and 53BP1 “ (e,n=4) 
cells across the 200-kb Igh region (left panels), 10-kb Sui (middle panels) and 
Se (right panels). f, Bar graphs (average + s.d.) showing the percentage of 


Se Su Se 


junctions mapped at ASy1?*" (break site), Sp and Se over the total number of 


junctions in the 200-kb Igh constant region in different genotypes. g, Bar 
graphs (average + s.d.) showing the percentage of junctions from above 
various genotypes of ASy1?*! 3’-broken end libraries mapped to Sui and Se as 
average + s.d.h, Bar graph (average + s.d.) showing comparison of percentages 
of junctions cloned using 4Sy1°*" 3’-broken end involving resection of St 
(top panel) and Se (bottom panel) breaks in indicated backgrounds. For 
detailed legends and further discussion refer to the Supplementary 
Information. 
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Extended Data Figure 9 | Inhibition of resection in 53BP1-deficient B cells 
by an ATM inhibitor (ATMi) does not rescue directional CSR joining to 
Sy1. a-d, Linear plots of pooled junctions across the 200-kb Igh constant 
region (left panels), the 20-kb region flanking downstream Sy1 (middle panels) 
and Se (right panels) from wild type plus DMSO control (a, n = 2), wild type 
plus ATMi (b, n = 3), 53BP1 ’~ plus DMSO (c, n = 3) and 53BP1 ’~ plus 
ATM i (d, n = 3) libraries are shown as above. e, Bar graph showing the 
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percentage of Sut, Sy1 and Se junctions (average + s.d.) from wild type plus 
DMSO 5’Su libraries (left) and wild type plus ATMi 5’-broken end libraries 
(right, n = 3). f, Bar graph showing the percentage of Sp, Sy1 and Se junctions 
(average + s.d.) from 53BP1 ’~ plus DMSO (left) and 53BP1 ’~ plus 

ATMi (right) 5’Sp libraries. For detailed legends and further discussion refer 
to the Supplementary Information. 
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Extended Data Table 1 | Statistical comparison for orientation bias and resection in Sy1 and Sz junctions from wild-type and DSBR-deficient 
B cell libraries 


a WT ATM?’ H2AX’ — 53BP1% —53BP1"+Ai__Rif1* 
WT (n=5) 
ATM* (n=3) <0.0001 
H2AX*~- (n=3) 0.0057 0.0031 
53BP1~ (n=8) <0.0001 <0.0001 <0.0001 
53BP1/-+Ai(n=3)  <0.0001 0.0039 0.001 0.1481 
Rift“ (n=3) <0.0001 0.181 0.0094 <0.0001 0.0029 
b WT ATM?’ H2AX* —53BP1* — 53BP1"+Ai Rif” 
WT (n=5) 
ATM* (n=3) <0.0001 
H2AX*~ (n=3) 0.0034 0.0291 
53BP17 (n=8) <0.0001 0.0003 0.0002 
53BP171-+Ai(n=3) N/A N/A N/A N/A 
Rift“ (n=3) 0.0012 0.9274 0.1154 0.0003 NIA 
c WT ATM’ H2AX* —53BP1% —53BP1/-+ Ai Rift” 
WT (n=5) 
ATM* (n=3) 0.0002 
H2AX*~ (n=3) 0.0037 = 0.3012 
53BP1~ (n=8) <0.0001 0.0002 0.0008 
53BP1/-+Ai(n=3)  <0.0001 0.158 0.7974 0.0005 
Rift“ (n=3) N/A N/A N/A N/A N/A 
d WT ATM’ H2AX* —53BP1% —53BP1'+Ai__ Rift” 
WT (n=5) 
ATM* (n=3) <0.0001 
H2AX*~ (n=3) <0.0001 0.0279 
53BP17 (n=8) <0.0001 0.0001 0.0003 
53BP11-+Ai(n=3) N/A N/A N/A N/A 
Rift (n=3) <0.0001 0.0199 0.058 0.0062. NIA 


a, b, Pvalues calculated by unpaired two-tailed t-test for the degree of bias in the Sy1 (a) and Se (b) junctions between wild-type and DSBR-deficient B cells with full-length Su for experiments described in Figs 3 
and 4. Orientation bias was calculated as described in the Methods for individual libraries. Numbers in parenthesis indicate independent experiments performed for each genotype. N/A, not available. c,d, Pvalues 
calculated by unpaired two-tailed t-test for the level of resections in the Sy1 (c) and Se (d) junctions for experiments described above. Percentage of resection was calculated as described in the Methods for 
individual libraries. Numbers in parenthesis indicate independent experiments performed for each genotype. Ai, ATM inhibitor. 
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A four-helix bundle stores copper for methane 


oxidation 
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Kevin J. Waldron! & Christopher Dennison! 


Methane-oxidizing bacteria (methanotrophs) require large quant- 
ities of copper for the membrane-bound (particulate) methane 
monooxygenase’”. Certain methanotrophs are also able to switch 
to using the iron-containing soluble methane monooxygenase 
to catalyse methane oxidation, with this switchover regulated by 
copper**. Methane monooxygenases are nature’s primary bio- 
logical mechanism for suppressing atmospheric levels of methane, 
a potent greenhouse gas. Furthermore, methanotrophs and meth- 
ane monooxygenases have enormous potential in bioremediation 
and for biotransformations producing bulk and fine chemicals, 
and in bioenergy, particularly considering increased methane 
availability from renewable sources and hydraulic fracturing of 
shale rock®*. Here we discover and characterize a novel copper 
storage protein (Csp1) from the methanotroph Methylosinus 
trichosporium OB3b that is exported from the cytosol, and stores 
copper for particulate methane monooxygenase. Csp]1 is a tetramer 
of four-helix bundles with each monomer binding up to 13 Cu(I) 
ions in a previously unseen manner via mainly Cys residues that 
point into the core of the bundle. Csp1 is the first example of a 
protein that stores a metal within an established protein-folding 
motif. This work provides a detailed insight into how methanotrophs 
accumulate copper for the oxidation of methane. Understanding this 
process is essential if the wide-ranging biotechnological applications 
of methanotrophs are to be realized. Cytosolic homologues of Csp1 
are present in diverse bacteria, thus challenging the dogma that such 
organisms do not use copper in this location. 

Biology exploits copper to catalyse a range of important reactions, but 
use of this metal has been influenced by its availability and potential 
toxicity’. In eukaryotes, excess copper is safely stored by cytosolic 
Cys-rich metallothioneins'”"”. Prokaryotic cytosolic copper-requiring 


enzymes are not currently known, with most prokaryotes thought not to 
possess copper storage proteins. Copper-binding metallothioneins, such 
as those in eukaryotes, have been identified in pathogenic mycobacteria 
where they help detoxify Cu(I)'*. Methanotrophs are Gram-negative 
bacteria that produce specialized membranes to harbour particulate 
methane monooxygenase (pMMO), which could either be discrete from 
or connected to the plasma membrane’"*”*. These organisms have the 
typical machinery for copper efflux from the cytosol”, but also possess 
the only currently characterized prokaryotic copper-uptake system; 
secreted modified peptides called methanobactins (mbtins)*’*, which 
bind Cu(I) with high affinity'”’*, localize to the cytoplasm”, and are 
involved in soluble (s)MMO to pMMO switchover”. 

While searching for internalized Cu(I)-mbtin in the switchover 
methanotroph M. trichosporium OB3b, large amounts of soluble, 
protein-associated copper were detected. This was unexpected as bioin- 
formatics predicts the transcriptional activator CueR as the only soluble 
copper protein*’. To identify the copper-binding proteins in the 
most abundant copper pool, soluble extracts from cells grown under 
elevated copper were separated by anion-exchange and gel-filtration 
chromatography. Copper peaks match the intensity profiles ofa protein 
band just below the 14.4kDa marker (Fig. la, b), which has been 
purified to near homogeneity (Extended Data Fig. la, b and Fig. 1c). 
This band was excised (Fig. 1b, c) and identified by peptide mass 
fingerprinting as an uncharacterized conserved hypothetical protein 
(herein Csp1, Extended Data Fig. 1c). The mature protein (122 amino 
acids) contains 13 Cys residues with a predicted molecular mass of 
12,591.4 Da, consistent with its migration on sodium dodecyl sulfate- 
polyacrylamide gel electrophoresis (SDS-PAGE) (Fig. 1). 

Recombinant apo-Csp1 (12,589.8 Da) elutes from a gel filtration col- 
umn with an apparent molecular mass of ~50 kDa (Fig. 2a), indicating 
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Figure 1 | Identification and purification of Csp1 from M. trichosporium 
OB3b. a, Copper content of anion-exchange fractions (NaCl gradient 
shown as a dashed line) of extract from M. trichosporium OB3b cells and the 
SDS-PAGE analysis of fractions 20-33. b, Copper content and SDS-PAGE 
analysis of the purification of the fraction containing the highest copper 
concentration (fraction 28) from a on a G100 gel-filtration column. A similar 


Elution volume (ml) 


Elution volume (ml) 


anion-exchange fraction (Extended Data Fig. 1a) was purified on a Superdex 
75 column (Extended Data Fig. 1b), with the copper content and SDS-PAGE 
analyses of eluted fractions shown in c. The band of interest that migrates 
below the 14.4 kDa marker is indicated in each panel with an arrow, and protein 
identification was performed on the bands from the 7.0 ml (b) and 12.0 ml (c) 
fractions. 
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Figure 2 | The structure of apo-Csp1. a, Analytical gel-filtration 
chromatograms of apo-Csp1 (red line) and protein to which 14.0 molar 
equivalents of Cu(I) were added (blue line) for samples (100 [1M when injected) 
in 20mM HEPES pH 7.5 containing 200 mM NaCl. The absorbance was 
monitored at 280 nm with the values for Cu(I)-Csp1 divided by 10 (see 
Extended Data Fig. 2a, b). Inset, SDS-PAGE analysis of the purified protein. 
b, Far-UV circular dichroism spectra of apo-Csp1 (red line) and Csp1 plus 14.0 
equivalents of Cu(I) (blue line) at 39.6 and 35.7 1M respectively in 100 mM 
phosphate pH 8.0. c, The tetrameric arrangement in the asymmetric unit of the 
crystal structure of apo-Csp1, with the side-chains of the Cys residues that point 
into the core of the four-helix bundle shown as sticks for one monomer in 

d. The opening into the core of the four-helix bundle is facing out in d, and 
involves His36, Met40, Met43 (on the extended 1) and Met48. 


a tetramer in solution (the native protein elutes at a similar volume 
(compare Fig. 2a and Extended Data Fig. 1b), demonstrating the same 
quaternary structure). The far-ultraviolet (UV) circular dichroism 
spectrum of Csp1 (Fig. 2b) reveals predominantly «-helical (~80%) 
secondary structure”. The asymmetric unit in the crystal structure of 
apo-Csp1 (Extended Data Table 1) consists of a tetramer of four-helix 
bundles (~75% a-helix), involving two sets of adjacent monomers 
aligned in an anti-parallel manner, with pairs of monomers rotated 
by ~55° (Fig. 2c). The major contact areas between monomers are 
~750-820 A’, consistent with the crystallographic tetramer being pre- 
sent in solution. All 13 Cys residues of apo-Csp1 point into the core of 
the four-helix bundle and none are involved in disulfide bonds (Fig. 2d). 
One end of the bundle, at which the amino (N) and carboxy (C) termini 
are found, is relatively hydrophobic while some Cys residues appear 
accessible at the opposite end of the molecule (Fig. 2d). 

Csp1 is isolated from M. trichosporium OB3b with copper bound 
(Fig. 1 and Extended Data Fig. 1a), which, along with the arrange- 
ment of the Cys residues in the structure of the apo-protein (Fig. 2d), 
indicates a function in Cu(I) storage. To test this hypothesis, Cu(I) 
binding was analysed in vitro by monitoring the appearance of 
(S)Cys—Cu(I) charge transfer bands (Extended Data Fig. 2a)'®?**4, 
giving a stoichiometry of ~11-15 Cu(I) equivalents per Csp1 mono- 
mer (Extended Data Fig. 2b). In the presence of relatively low 
concentrations of the high-affinity chromophoric Cu(I) ligand bicinch- 
oninic acid (BCA; logf. = 17.7 (ref. 25)), apo-Csp1 binds all Cu(I) 
until ~12-14 equivalents have been added (see, for example, Fig. 3a). 
Furthermore, an apo-Csp1 sample incubated with ~25 equivalents of 
Cu(I) elutes from a gel-filtration column with ~12-13 equivalents 
of Cu(I) (Fig. 3b). The binding of Cu(I) has no significant effect on 
either the secondary or quaternary structure (Fig. 2a, b). The crystal 
structure of Cu(I)-Csp1 is shown in Fig. 3c (Extended Data Table 1). 
The anomalous difference density for data collected just below the 
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Figure 3 | Cu(I)-binding by Csp1. a, Plot of [Cu(BCA),]*~ concentration 
against the [Cu(I)]/[Csp1] ratio upon titrating Cu(I) into apo-Csp1 (2.43 uM) 
in the presence of 103 1M BCA (same buffer as for Fig. 2a). [Cu(BCA),]>— 
starts forming after ~12 equivalents of Cu(I) are added. b, Analytical 
gel-filtration chromatogram of Csp1 (116 4M) mixed with ~25 equivalents 
of Cu(I) in the same buffer. Csp1 (Bradford, red squares), copper (atomic 
absorption spectroscopy, blue triangles) and Cu(I) (BCS in the presence of 
7.6 M urea, open cyan circles) concentrations are shown. The main Csp1- 
containing fractions bind 11.8-12.9 equivalents of Cu(I). c, The structure of 
Cu(I)-Csp1 (chain A) including the anomalous difference density for copper 
contoured at 3.50 (orange mesh). The copper ions (Cul-Cu13 correspond to 
A1123-A1135 in the PDB file 5AJF) are represented as dark grey spheres, 
and the side chains of Cys, and other key residues, as sticks. The coordination of 
Cu(J) ions at the two ends of the four-helix bundle is shown in d and 

e with bond distances (in A) in red. 


copper-edge identifies 13 copper ions within the core of the four-helix 
bundle (Fig. 3c), bound predominantly by the 13 Cys residues. The 
oxidation state of copper in Csp1 crystals was analysed using X-ray 
absorption near-edge spectroscopy (Extended Data Fig. 2c). A well- 
defined peak at 8,984 eV, due to the Cu 1s—4p transition, is consistent 
with 2/3-coordinate Cu(I)**. 

The 13 Cu() ions are distributed throughout the core of the four-helix 
bundle of Csp1 (Fig. 3c-e). Ten of the Cu(I) sites involve bis-Cys ([1-S for 
all the Cys ligands) ligation, with Cu(I)-S(Cys) bond lengths and S(Cys)- 
Cu-S(Cys) angles ranging from 2.0 to 2.3 A and 142° to 177°, respectively 
(in chain A). Exceptions are Cul] and Cu13 (Fig. 3e, see below), and Cu4 
(Fig. 3d), which has three coordinating thiolates (S(Cys)-Cu-S 3(Cys) 
angles ranging from 90° to 145 °). Ten Cu(I) ions are within 2.7A ofa 
neighbouring metal (2.8 A for Cu5 and Cul3, and 2.9 A for Cu9) with 
some interacting with more than one Cu(I) (notably Cu7 and Cu10). 
Four of the Cu(I) sites (Cul, Cu6, Cu8 and Cu12) are coordinated by the 
Cys residues from CXXXC motifs (Fig. 3c-e and Extended Data Figs 1c 
and 3), with the backbone carbonyl of the first Cys ligand close (2.1- 
2.3 A) to the Cu(I) ion. At other Cu(I) sites the Cys ligands originate from 
adjacent o-helices (for example, Cu2, Cu3 and Cu4 in Fig. 3d and Cul0 
in Fig. 3e). Cull involves ligation by two thiolates and Met48 (2.3 A), 
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with bond angles ranging from 102° to 142° (Fig. 3e). Cul3 is also 
coordinated (Fig. 3e) by Met48 (2.6 A), as well as by His36 (N®, 2.0.A) 
and Cys37 (2.2 A). These two atypical Cu(]) sites (Cull and Cul3) are 
found at the open end of the bundle, and with the nearby Met40 and 
Met43 (Fig. 3c) potentially help to recruit the metal. 

Metal storage within an established protein-folding motif has not 
previously been observed. Iron is stored by ferritins using polymeric 
four-helix bundles, but with monomers forming a protein envelope 
that surrounds a ferric-oxide mineral core’’. Storing multiple Cu(I) 
ions within a four-helix bundle in Csp1 provides a stark contrast to 
unstructured apo-metallothioneins that fold around metal clusters. 
For example, a truncated form of yeast metallothionein binds a 
Cu(I)s-thiolate cluster using ten Cys residues with six three-coordinate 
and two two-coordinate sites’’. A four-helix bundle is formed upon 
Cu(I) addition to a synthetic peptide possessing a CXXC motif, and 
binds a Cu,S, cluster**. The arrangement of the Cu(I) ions within Csp1 
is unprecedented in biology and inorganic chemistry. 

Tetrameric Csp1 is capable of binding up to 52 Cu(I) ions, consist- 
ent with a role in copper storage. The major copper-requiring protein 
in M. trichosporium OB3b is pMMO. Regardless of the uncertainty 
about the structure of the specialized membranes that house pMMO, 
cytosolic copper must cross a membrane before incorporation into this 
enzyme. Csp1 and its closely related homologue Csp2 possess signal 
peptides (Extended Data Figs 1c and 3), predicted as targeting the twin 
arginine translocation (Tat) machinery”, and therefore locate outside 
the cytosol. To test whether Csp1 and Csp2 store copper for pMMO, 
the Acsp1/csp2 double mutant strain of M. trichosporium OB3b was 
constructed. Switchover to sMMO for cells transferred from high to 
low copper is significantly faster in Acsp1/csp2 than in the wild-type 
strain, and sMMO activity is 1.8 times greater in the former after 
almost 28h (Extended Data Fig. 4). These data are not inconsistent 
with Csp1 and Csp2 storing, and potentially also chaperoning, copper 
for pMMO, thus allowing M. trichosporium OB3b to use this enzyme 
longer for growth on methane when copper becomes limiting, 
although this hypothesis has not been explicitly tested. 

Animportant attribute of a metal storage protein is its metal affinity. 
Upon increasing the concentration of BCA, Cu(I) starts to be withheld 
from Csp1 (Fig. 4a and Extended Data Fig. 5a-f). The buffering of 
free Cu(I) by ligands such as BCA and the tighter chromophoric 
Cu(I) ligand bathocuproine disulfonate (BCS; logf. = 20.8 (ref. 25), 
see Extended Data Fig. 6a, b) has been used to obtain an average 
Cu(1) affinity for Csp1 of ~1 x 10'’M ! (Fig. 4b, c and Extended 
Data Fig. 6c, d). Mbtin, the copper-chelating ligand produced by 
M. trichosporium OB3b, has a much tighter Cu(1) affinity’, and stoi- 
chiometrically removes Cu(I) from Cspl in ~1h (Fig. 4d and 
Extended Data Fig. 7a—d). This high affinity makes copper removal 
from imported mbtin potentially problematical (for example apo- 
Cspl cannot directly acquire Cu(I) from mbtin; Extended Data 
Fig. 7e), and Cu(I)-mbtin may need to be degraded within a cell to 
release its metal cargo’*. Csp1 is present at elevated copper levels in 
M. trichosporium OB3b and sequesters the metal (Fig. 1 and Extended 
Data Fig. 1a), while mbtin production is suppressed under these con- 
ditions*’®. As copper levels in the cell decrease, mbtin will be produced 
and could play a role in removing and using Csp1-bound copper. 

Another Csp1 homologue, Csp3, is also encoded in the M. trichos- 
porium OB3b genome (Extended Data Fig. 3), and is widespread in 
bacteria (Extended Data Fig. 8). This includes the functionally unchar- 
acterized proteins from Nitrosospira multiformis and Pseudomonas 
aeruginosa (apo-structures are available in Protein Data Bank (PDB) 
under accession numbers 3LMF and 3KAW) that also have all Cys 
residues pointing to the core of four-helix bundles. Csp3 has no signal 
peptide (Extended Data Fig. 3) and therefore, unlike Csp1 and Csp2, is 
presumably cytosolic. If Csp1 and Csp2 are exported via the Tat system 
they would fold in the cytosol”, perhaps to prevent disulfide forma- 
tion. Tat-export might also imply copper acquisition before export”, 
allowing transport of large amounts of copper away from systems that 
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Figure 4 | Cu(I) affinity of Csp1 and Cu(1) release. a, Plots of [Cu(BCA),]>— 
concentration against the [Cu(I)]/[Csp1] ratio for mixtures of apo-Csp1 

(3.57 uM) and Cu(I) in the presence of 120 uM (filled squares), 300 uM (open 
circles), 600 1.M (open triangles), 900 1M (cross) and 1,200 uM (‘+’ symbol) 
BCA (all data acquired after incubation for 41h). b, Plot of [Cu(BCA),]*~ 
concentration against the [Cu(I)]/[Csp1] ratio for mixtures of apo-Csp1 

(3.61 1M) and Cu(I) in the presence of 1,210 1M BCA (open squares) for 20h, 
along with the data from a at 1,200 11M BCA (filled squares). c, Fractional 
occupancy of Cu(I)-binding sites in Csp1 (maximum value 11.7 equivalents) at 
different concentrations of free Cu(I) from the data shown in b at 1,210 1M 
BCA up toa [Cu(1)]/[Csp1] ratio of 19.2. The solid line shows the fit of the data 
to the nonlinear Hill equation giving an average dissociation constant for Cu(1), 
Kew of (7.5 + 0.1) X 10° 18M (n= 3.1 0.2, see Extended Data Fig. 6c, d). 

d, Plots of the absorbance at 394 nm (spectra in Extended Data Fig. 7a—d) 
against time after the addition of Cu(I)-Csp1 (1.02 1M) loaded with 13.0 
equivalents of Cu(I)) to 13.4 1M (filled squares) and 27.4 UM (filled circles) 
apo-mbtin, and Cu(I) (13.3 uM) to 13.4 4M (open squares) and 27.1 uM 
(open circles) apo-mbtin. All experiments were performed in the same buffer 
as for Fig. 2a. 


remove this metal from the cytosol (CueR and copper-transporting 
ATPases) and into the same compartment housing pMMO. Csp1 
(and Csp2) can store large quantities of copper for, and potentially 
deliver the metal to, pMMO, an enzyme of great environmental 
importance that has tremendous biotechnological potential for the 
utilization and mitigation of methane. The prediction would be that 
Csp3 can store copper in the cytosol, not only in M. trichosporium 
OB3b but in many other bacteria. This raises the possibility that 
there are cytosolic copper-requiring enzymes in bacteria still to be 
discovered. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Identification and purification of copper proteins from M. trichosporium 
OB3b. M. trichosporium OB3b cultures were grown as described’’ at 27°C in a 
5 1 fermentor agitated at 250 r.p.m. in nitrate minimal salts (NMS) medium sup- 
plemented with 10 1M iron and typically 5 .M copper. Cultures were analysed for 
sMMO activity as described’’. Cells harvested at an absorbance, A¢oo nm typically 
between 1.1 and 2.2 were collected by centrifugation (4 °C) at 9,000g and pellets 
washed with 20 mM HEPES pH 8.8 followed by the same buffer containing 10 mM 
EDTA. The cell pellet was resuspended in 20mM HEPES pH 8.8 and lysed by 
freeze grinding in liquid nitrogen*’. The lysate was allowed to thaw in an anaerobic 
chamber (Belle Technology, [O2] <2 ppm), before loading into ultracentrifuge 
tubes sealed in the anaerobic chamber and centrifuged at 160,000g for 1 h at 10 °C. 
The supernatant was recovered inside the anaerobic chamber, diluted fivefold and 
loaded (either 59 or 90 mg protein from 10 and 161 of cells, respectively) onto a 
5 ml HiTrap Q HP anion-exchange column (GE Healthcare). For the purification 
of extracts from 101 of cells, the HiTrap column was eluted with a linear NaCl 
gradient (0-500 mM) inside the anaerobic chamber with a homemade mixing 
device (total volume 80 ml). For the 16 | preparation, the HiTrap column was 
eluted on an Akta Purifier with a linear NaCl gradient (0-250 mM) using thor- 
oughly degassed and nitrogen-purged buffers (total volume 80 ml). Copper con- 
tent in eluted fractions (1 ml) was measured by inductively coupled plasma mass 
spectrometry (Thermo Electron, X series). Samples were diluted tenfold in 2% 
nitric acid containing 20 1g] silver as internal standard, and analysed for **Cu 
and 7 Ag in standard mode (100 reads, 30 ms dwell, 3 channels, 0.02 atomic 
mass unit separation, in triplicate). Copper concentration was determined by 
comparison with matrix-matched elemental standard solutions. Copper-contain- 
ing fractions were analysed by SDS-PAGE using Oriole fluorescent gel stain 
(Bio-Rad). All images of fluorescently labelled gels were inverted to make bands 
clearer in print. 

Gel-filtration chromatography of copper-containing fractions was performed 
on either a Sephadex G100 (Sigma) packed column (1cm X 20cm) inside the 
anaerobic chamber, or on a Superdex S75 10/300 GL (GE Healthcare) column 
in 20 mM HEPES pH 7.5 plus 200 mM NaCl (thoroughly degassed and nitrogen- 
purged for the Superdex 75 column that was attached to an Akta Purifier) and at 
flow rates of 0.35 and 0.8ml min’ respectively (fraction size 1 ml). Proteins 
whose intensity on SDS-PAGE gels correlated with copper concentration profiles 
were excised from gels and underwent peptide mass fingerprinting’’. Digestion 
with trypsin was performed at an E:S ratio of 1:100 overnight in 50 mM NH,HCO; 
pH 8. The resultant peptides were resuspended in 0.1% aqueous trifluoroacetic 
acid and desalted using C18 ZipTips (Millipore). Peptides were then separated on 
a NanoAcquity liquid chromatography system (Waters) using a 75 um * 100 mm 
C18 capillary column (Waters). A linear gradient from 1 to 50% acetonitrile in 
0.1% aqueous formic acid was applied over 30 min at a flow rate of 0.3 ml min’. 
Eluted peptides were detected using a linear trap quadrupole Fourier transform 
mass spectrometer (ThermoElectron) in positive ionization mode with scans over 
300-1,500 m/z in data-dependent mode and a Fourier transform mass spectro- 
metry (MS) resolution setting of 50,000. The top five ions in the parent scan were 
subjected to MS/MS analysis in the linear ion trap region. The proteins from which 
detected peptides originated were identified using the Mascot MS/MS ion search 
tool (Matrix Science) by comparison against the entire database of proteobacteria 
in NCBI. SignalP and TatP were used to identify putative signal sequences”. 
Cloning the csp1 gene. The csp1 gene without its predicted signal peptide (that is, 
Gly25 to Ala144, see Extended Data Fig. 1c) was amplified from M. trichosporium 
OB3b genomic DNA using primers Csp1_F and Csp1_R (Extended Data Table 2) 
and cloned into pGEMT, which introduced a Met residue at the N terminus. Both 
strands of the gene were verified by sequencing, which was subsequently cloned 
into the NdeI and Ncol sites of pET29a (pET29a_Csp1). 

Expression and purification of Csp1. Escherichia coli BL21 (DE3) transformed 
with pET29a_Csp1 was grown in LB media at 37 °C (100 pg ml ' kanamycin) until 
an Agoonm Of ~0.6. Cells were induced with 1 mM isopropyl B-p-thiogalactopyr- 
anoside, harvested after 6h and stored at —20°C. Pellets were resuspended in 
20mM Tris pH 8.5 plus 1 mM dithiothreitol (DTT), sonicated and centrifuged 
at 40,000g for 30 min. The supernatant was diluted fivefold in 20 mM Tris pH 8.5 
containing 1 mM DTT and loaded onto a HiTrap Q HP column (5 ml) equilibrated 
in the same buffer. Proteins were eluted with a linear NaCl gradient (0-300 mM, 
total volume ~200 ml). Csp1-containing fractions, identified by SDS-PAGE, were 
combined, diluted tenfold in 10mM Tris pH 7.5 plus 1mM DTT, applied to a 
HiTrap Q HP column (5 ml) equilibrated in the same buffer and eluted using a 
linear NaCl gradient (0-200 mM, total volume ~200 ml). The purest fractions, 
identified by SDS-PAGE, were combined and thoroughly exchanged into 20 mM 
HEPES pH 7.5 plus 200 mM NaCl using either a stirred cell or a centrifugal filter 
unit (typically with 10kDa molecular mass cut-off membranes). Except for crys- 
tallization of the apo-protein, Csp1 was further purified by gel-filtration chromato- 


graphy on a Superdex 75 10/300 GL column (GE Healthcare) equilibrated with 
20 mM HEPES pH 7.5 containing 200 mM NaCl. The protein was found to contain 
no copper and zinc using atomic absorption spectrometry (AAS, with an M Series 
spectrometer, Thermo Electron) typically with ten standards containing up to 
1.8 ppm copper and 1.0 ppm zinc in 2% HNO; using the standard calibration 
method. The mass of purified Csp1 was verified both by matrix assisted laser 
desorption ionization time-of-flight and Fourier transform ion cyclotron res- 
onance MS. The Met residue introduced at the N terminus during cloning is largely 
cleaved in the overexpression host, giving a purified protein with an experimental 
mass of 12,589.8 Da, consistent with the calculated value (12,591.4 Da) fora mature 
protein having Gly1 and Ala122 at the N and C termini respectively. 

Isolation, purification and quantification of mbtin. The apo- and Cu(I)-forms 
of full-length mbtin from M. trichosporium OB3b was isolated, purified and quan- 
tified as described previously’’. 

Quantification of Csp1. Apo-Csp1 was quantified by the reduction of 5,5’ -dithio- 
bis(2-nitrobenzoic acid) (DTNB, Ellman’s reagent) in a sealed anaerobic quartz 
cuvette monitored at 412 nm using a A435 UV-visible (UV-vis) spectrophotometer 
(Perkin Elmer)’’. The reaction was initiated in the anaerobic chamber by the 
addition of apo-Csp1 (final concentration typically 0.2-4 1M) to a buffered solu- 
tion of DTNB (240-480 1M) in the presence of urea (final concentration >7 M). 
Under these conditions Csp1 rapidly unfolds, particularly at concentrations 
<8 iM (Extended Data Fig. 5g, h). The buffer was typically 20 mM HEPES pH 
7.5 plus 200 mM NaCland 1 mM EDTA, but in some cases 100 mM phosphate pH 
8.0 plus 1 mM EDTA was used (the 5-10 mM DTNB stock solution was always 
made in 100 mM phosphate pH 8.0 plus 1 mM EDTA). After approximately 10- 
20 min, the absorbance at 412 nm reached a plateau and it was assumed that all 
13 Cys residues of unfolded apo-Csp1 had reacted with DTNB (in the absence of 
urea very little reaction with DTNB occurs, consistent with the structure of the 
apo-protein (Fig. 2d)). Apo-Csp1 (~4-23 HM) incubated overnight in the anaer- 
obic chamber with DTT (~2-6.4mM) and desalted on a PD10 column (GE 
Healthcare) was also quantified with DTNB in 20mM HEPES pH 7.5 plus 
200 mM NaCl, 1mM EDTA and >7M urea (again little reaction with DTNB 
occurred in the absence of urea). At higher DIT concentrations, samples were 
desalted twice to ensure all DTT had been removed. For the DTNB reaction, a 
molar absorption coefficient (¢ value) of 14,150 M!cm! at 412 nm (refs 24, 33, 
34) was verified using glutathione with and without >7 M urea in both 20mM 
HEPES pH 7.5 plus 200 mM NaCl and 1 mM EDTA (three times), and in 100 mM 
phosphate pH 8.0 plus 1 mM EDTA (twice). Apo-Csp1 concentrations were also 
determined using the Bradford assay (Coomassie Plus protein assay kit, Thermo 
Scientific) with BSA standards (0-1,000 pg ml '). The ratio of apo-Csp1 concen- 
tration obtained using the Bradford and DTNB assays (Bradford: DTNB ratio) was 
1.11 = 0.12 (m = 27) for samples not treated with DTT and 1.03 + 0.04 (n = 9) for 
samples that were reduced before these assays. Incubation with DTT does not have 
a significant effect on the thiol count and this step was excluded from all sub- 
sequent experiments as contamination of apo-Csp1 with even trace amounts of 
DTT would influence quantification using DTNB. 

Investigating Cu(I)-binding to Cspl. Cu(I) stock solutions (50-100 mM 
[Cu(CH3CN),4]PF, in 100% acetonitrile) were diluted to ~1-12mM in 20mM 
HEPES pH 7.5 plus 200 mM NaCl in the anaerobic chamber**”’. Copper concentra- 
tions were determined by AAS, and Cu(I) was quantified anaerobically by UV-vis 
with the chromophoric high-affinity Cu(I) ligands BCS and BCA by monitoring 
formation of [Cu(BCS),]* and [Cu(BCA))]* at 483 nm (¢ = 12,500 M ‘cm ’) 
and 562 nm (¢ = 7,700 M_' cm) respectively*>**. Cu(I) was added to apo-Csp1 
by mixing the appropriate amount of the buffered Cu(I) solution with apo-protein 
(~2-200 1M) that had been quantified by DTNB, in 20mM HEPES pH 7.5 plus 
200 mM NaCl in the anaerobic chamber. Using DTNB in urea is a more precise 
method for quantifying apo-Csp1 than the Bradford assay, particularly at low protein 
concentrations, and was therefore used routinely. However, the DTNB assay in urea 
cannot be used for Cu(I)-Csp1 owing to very slow unfolding (Extended Data Fig. 5i). 
Therefore, for most Cu(I)-Csp1 samples the number of Cu(I) equivalents quoted are 
based on the apo-protein concentration determined by DTNB (the Cu(I)-Csp1 
concentrations take into account dilution of the sample due to the addition of 
Cu(I), and the number of Cu(I) equivalents quoted are those in the final sample). 
The [Cu(I)]/[Csp1] ratio was routinely checked using protein (Bradford assay) and 
copper (AAS and with 2.5 mM BCS both in the absence and presence of >7 M urea, 
compared in Extended Data Fig. 2d) quantifications, with good agreement. For 
titrations (performed more than ten times), Cu(I) from the buffered solution was 
added to apo-Csp1 (~2-20 1M) in 20mM HEPES pH 7.5 plus 200 mM NaCl in a 
sealed anaerobic quartz cuvette using a gastight syringe (Hamilton). The immediate 
appearance of ligand-to-metal charge transfer (LMCT) bands, characteristic of Cu(I) 
coordination by thiolates’*****, was monitored in the UV region. Emission spectra 
were acquired on a Cary Eclipse fluorescence spectrophotometer (Varian) by exciting 
at 280 nm and following the emission in the 400-700 nm range using emission and 
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excitation slits of 20 and 10 nm respectively. The concentration of the Cu(I) solution 
was regularly checked during titrations, usually with BCS, and replaced as required. 
Competition between Csp1 and chromophoric ligands. The binding of Cu(I) by 
Csp1 in the presence of either BCA or BCS was investigated in a variety of ways. 
Additions of Cu(I) to Csp1 (~1.6-3.0 uM) were performed in the presence of 
~90-110 1M BCA in 20 mM MES at pH 5.5 (the pK, of BCA is 3.8 (ref. 25) and its 
2 value is therefore hardly affected at this pH value) and 6.5, HEPES at pH 7.5, 
TAPS at pH 8.5 and CHES at pH 9.5, all plus 200 mM NaCl. [Cu(BCA),]*~ 
concentrations were determined under anaerobic conditions as described above. 
This experiment was repeated twice at pH 6.5 and at least three and up to six times 
at other pH values, except at pH 7.5, when it was performed more than ten times. 
Apart from at pH 5.5, equilibration typically took less than 10 min (~20 min at 
~11-15 Cu(I) equivalents). During these titrations the concentration of the Cu(I) 
solution was regularly checked and replaced as required. For experiments at 
pH 5.5 (MES), and at higher concentrations of BCA (up to ~1.2 mM and using 
~2.0-3.7 UM apo-Csp1), UV-vis spectra were acquired between 4 and 48 h after 
mixing, to ensure equilibration had occurred (experiments at pH 6.5 and 9.5 were 
repeated two and four times respectively, while the experiment at pH 7.5 was 
repeated six times). The final pH values of samples were checked at the end of 
experiments and were within 0.1 pH units (0.2 at pH 5.5) of the buffer used. 

A comparison of the ability of Csp1 to compete with BCA and BCS was per- 

formed at least three times by incubating Cu(I)-Csp1 (~2.4-2.7 UM) loaded with 
~10-13 equivalents of Cu(I) with various concentrations of either ligand in 20 mM 
HEPES pH 7.5 plus 200 mM NaCl in the anaerobic chamber. [Cu(BCS),]?~ and 
[Cu(BCA),]* concentrations were determined by UV-vis under anaerobic con- 
ditions of mixtures incubated for various times (4-48 h) with very little change. 
Furthermore, apo-Csp1 (~2.5-2.8 1M) plus BCS (~ 100 or 250 iM) was incubated 
anaerobically with 0 to ~22 Cu(I) equivalents in 20mM HEPES pH 7.5 plus 
200 mM NaCl (repeated four times). The absorbance at 483 nm was monitored 
anaerobically after 4h and up to 43h after mixing (no variation observed). The 
kinetics of Cu(I) removal from Cu(I)-Csp1 (~0.3-1.6 1M) loaded with ~11-14 
equivalents of Cu(I) by ~2,500 UM BCS was compared (five times) in the absence 
and presence of >7 M urea monitored anaerobically at 483 nm in 20 mM HEPES 
pH7.5 plus 200 mM NaCl. 
Estimation of the average Cu(I) affinity of Csp1. The average Cu(I) affinity of 
Csp1 was estimated by determining the Cu(I) occupancy of Csp]1 as a function of 
the concentration of free Cu(I) ({Cu(I),ce]) buffered using either BCA or BCS**°. 
Apo-Csp1 (~2.7 and ~3.6uM respectively) in 20mM HEPES pH 7.5 plus 
200 mM NaCl was mixed anaerobically with increasing Cu(I) concentrations in 
the presence of BCS (101 ,1M) or BCA (1,210 and 2,000 1M). Mixtures were 
incubated for up to 67h. The final pH values of samples were checked at the 
end of experiments. The concentrations of [Cu(BCA),]* and [Cu(BCS),]*~ 
({Cu(L)2], where L = BCA or BCS) were determined as described above and the 
concentration of Cu(I) bound to Csp1 ([Cu(1)cspi]) was calculated using equation 
(1), where [Cu(1),ota1] is the total concentration of Cu(I) added: 


[Cu(1) cyp1] = [Cu(1) sora] — [Cu(L) 2] (1) 
[Cu(I)free] was calculated using equation (2): 
= [Cu(L),| 
[Cu(I) free] = LEXB, (2) 


where [L’] is the total free ligand concentration ({L’] = [L] - 2[Cu(L)2]) and f; is 
the affinity of either BCA or BCS for Cu(I) (log, = 17.7 and 20.8 respectively”’). 
The Cu(I) occupancy was determined by dividing [Cu(I)csp:] obtained from 
equation (1) by the Csp1 concentration. The fractional Cu(I) occupancy was 
calculated using the maximum value observed in a particular experiment (see 
below) and plots against [Cu(D)ree] were fitted to the nonlinear form of the Hill 
equation (3) to obtain the average dissociation constant of Csp1 for Cu(I) (Keu) 
and Hill coefficient (n value): 


[Cu(]) ree] f 
Key oF [Cu(]) free] ‘ 


The maximum calculated Cu(I) occupancies of Csp1 were 11.3 and 11.7 equiva- 
lents for the BCS and BCA experiments at 101 and 1,210 uM respectively. Cu(I)- 
Csp1 samples with the maximum occupancy (for BCS from the titration and using 
samples prepared specifically for BCA) were separated from [Cu(L)]*~ and free L 
using a PD10 column in 20mM HEPES pH 7.5 plus 200 mM NaCl. The protein 
and copper contents of the Cu(I)-Csp1-containing fractions were analysed 
with Bradford assays, and using BCS in the presence of >7 M urea. 11.6 + 0.6 
equivalents of Cu(I) per Csp1 were determined for samples from the experiment 
with 101 1M BCS to which 24.2-25.5 equivalents of Cu(I) were added (11.0-11.3 
equivalents calculated using equation (1)), and 12.8 + 0.7 for a Csp1 sample in 


Fractional Cu (I) occupancy = (3) 
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the presence of 1,200 4M BCA to which 18.4 equivalents of Cu(I) were added 
(11.3-11.6 equivalents calculated using equation (1)). In experiments with BCA, 
[Cu(1)cspi] appeared to decrease at higher Cu(I) concentrations, and this effect 
was much greater at higher BCA concentrations. For the data shown in Fig. 4b at 
1,210 1M BCA, [Cu(I)csp1] decreased by <10% of the maximum value when 28.9 
equivalents of Cu(I) were added (~20% upon addition of 49.5 equivalents of Cu(I) 
to a separate apo-Csp1 sample). In an experiment at 2,000 uM BCA, the maximum 
number of Cu(I) equivalents bound by Csp1 calculated using equation (1) was 
9.33, achieved upon addition of 19.3 equivalents of Cu(I) and [Cu(1)cspi] appeared 
to decrease by almost 50% upon addition of 49.6 equivalents of Cu(I). The fit of 
the data up to 19.3 Cu(I) equivalents added, to the nonlinear Hill equation, gives 
Koy = (6.3 £ 0.1) X 10 '8M (n= 4.1 + 0.2), consistent with the values at lower 
BCA concentration (see Fig. 4c). A comparable apo-Csp1 sample in the presence 
of 2,000 tM BCA plus 59.9 equivalents of Cu(I) that gave 5.18 Cu(I) equivalents 
calculated using equation (1), was found to contain 11.6 + 0.6 equivalents of Cu(I) 
after desalting on a PD10 column. We are currently unsure of the reason for the 
apparent decrease in [Cu(I)c.pi], but it could be the result of the formation of 
Csp1-Cu(I)-BCA adducts. Such species may contribute to the absorbance at 
562nm, resulting in an apparent decrease in [Cu(I)csp:]. However, this has 
little effect on the data at 1,210 1M BCA and the agreement in Kc, for this experi- 
ment and that at 101 tM BCS (see Extended Data Fig. 6d) is very good (a fit of 
a repeat of the experiment with BCS to the nonlinear Hill equation gave 
Kou = (1.3 £ 0.1) X 10°17M (n=2.4 + 0.2)). 

Analytical gel-filtration chromatography of apo- and Cu(I)-Csp1. Analytical gel- 
filtration chromatography of apo-Csp1 (~10-100 uM) and protein (~2-120 pM) 
plus ~12-14 equivalents of Cu(I) was performed on a Superdex 75 10/300 GL 
column equilibrated in 20 mM HEPES pH 7.5 plus 200 mM NaCl degassed and 
purged with nitrogen™. Injection volumes ranged from 100 to 350 ll, the flow rate 
was 0.8 ml min™’ and absorbance was monitored at 280 nm. Apparent molecular 
masses of 5143 (n=21) and 50+3 (m=18)kDa for apo- and Cu(I)-Csp1l 
respectively were calculated from elution volumes by calibrating the column with 
alow molecular mass calibration kit (GE Healthcare)”. The gel-filtration analysis of 
apo-Csp1 (~70-150 tM) plus ~22-26 equivalents of Cu(I) was performed three 
times and the eluted fractions were quantified for protein with Bradford assays, and 
for copper by AAS and Cu(I) using BCS in the presence of 7.6 M urea. 

Circular dichroism spectroscopy. Far-UV circular dichroism spectra (180- 
250 nm) were recorded using a JASCO J-810 spectrometer**". Apo-Csp1 and pro- 
tein plus 14.0 equivalents of Cu(I) were analysed in 20mM HEPES pH 7.5 plus 
200 mM NaCl and in 100 mM potassium phosphate pH 8 (buffer exchanged using a 
PD10 column). Protein concentrations ranged from 7.94 to 39.7 uM (from 0.10 to 
0.50mgml~'). The pH-stability of apo-Csp1 in 20mM buffer pH 5.5 (MES), 7.5 
(HEPES) and 9.5 (CHES), all plus 200 mM NaCl, was monitored during 43 h incuba- 
tion in the anaerobic chamber, with the final pH values of samples within 0.1 pH unit 
of the expected value (repeated at least twice). The o-helical content of apo- and 
Cu(I)-Csp1 was routinely found to be ~80% (repeated more than ten times). The 
unfolding of apo-Csp1 and protein plus 14.0 equivalents of Cu(I) was monitored 
after the addition of urea (final concentration >7 M) at pH 7.5. 

Cu(I) exchange between Csp1 and mbtin. Cu(I)-Csp1 (~0.8-1.0 uM) loaded 
with ~13 equivalents of Cu(I) was added to apo-mbtin (~10-13 or ~20-27 11M). 
UV-vis spectra were acquired before the addition of Cu(I)-Csp1 and up to 6h 
after mixing (repeated three times). Controls were performed by the addition of 
~10-13 LM Cu(I) to apo-mbtin (~10-13 or ~20-27 1M). The possibility of cop- 
per transfer from Cu(I)-mbtin (typically ~2.2-3.0 uM) to apo-Csp1 (from 143 to 
234 1M) was investigated for up to 20h by UV-vis spectroscopy (repeated four 
times). All of the mbtin-containing samples were incubated anaerobically pro- 
tected from light. 

Crystallization, data collection, structure solution and refinement. Crystals of 
apo-Csp1 (~20 mg ml ', Bradford assay) in 20 mM HEPES pH 7.5 were obtained 
aerobically at 20 °C using the hanging drop method of vapour diffusion from 1 il 
of protein mixed with 1 ll of 0.1 M bis-Tris pH 6.5 plus 25% w/v PEG 3350 (500 pl 
well solution) and were frozen in Paratone-N oil. Cu(I)-Csp1 (~9.5-11 mg ml 1, 
Bradford assay) was prepared by adding ~12-14 equivalents of Cu(I) to apo- 
protein (~70-80 1M, quantified by DTNB) in 20mM HEPES pH 7.5 plus 
200 mM NaCl and subsequently concentrated by ultrafiltration (all performed 
anaerobically). Cu(I)-Csp1 prepared anaerobically was removed from the anaer- 
obic chamber for <20 min (Cu(I)-Csp1 shows no sign of oxidation even after 
incubation in air for 42 h) to allow screens to be set up with a crystallization robot. 
The crystallization trays were transferred back into the anaerobic chamber via a 
port that was purged with nitrogen for 3 min and were in the chamber for <5 min 
before being sealed. Cu(I)-Csp1 crystallized at room temperature using the sitting 
drop method of vapour diffusion with 200 nl of protein plus 100 nl of 0.03 M 
MgCh, 0.03 M CaCl,, 0.1 M Tris-Bicine pH 8.5, 12.5% w/v 2-methyl-2,4 pentane- 
diol (racemic) plus 12.5% PEG 1000 and 12.5% PEG 3350 (80 ul well solution). 
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The Cu(I)-Csp1 crystal for the X-ray absorption near-edge spectrum shown in 
Extended Data Fig. 2c was obtained as above but using 600 nl of protein plus 300 nl 
of 0.025 M MgCl, 0.025 M CaCh, 0.1 M Tris-Bicine pH 8.5, 13.5% v/v 2-methyl- 
2,4 pentanediol (racemic) plus 13.5% PEG 1000 and 13.5% PEG 3350 (80 il well 
solution). Crystals were frozen directly in the reservoir solution. 

Diffraction data were collected at the Diamond Light Source, UK, on beamlines 
102 (apo-Cspl, A = 0.9795 A) and 124 (Cu(I)-Csp1, 4 = 1.3777 A) at 100K, pro- 
cessed and integrated with XDS and scaled using Aimless**’. For both data sets, 
space groups were determined using Pointless and later confirmed during refine- 
ment**. The phase was solved by single-wavelength anomalous dispersion using 
copper, but was complicated by poorly resolved low-resolution reflections. The 
omission of data from 44.42 to 10.00.A was required for successful heavy-atom 
location and the calculation of initial phases. Phasing, density modification and 
initial model building were performed using PHASER_EP* through the CCP4 
interface**, using SHELXD*’, PARROT and BUCCANEER”. The model of 
Cu(I)-Csp1 was used as the search model for molecular replacement in 
Molrep® to solve the apo-protein data set. The first 11 residues could not be 
modelled in both structures (His12 is close to the open end of an adjacent mono- 
mer in the Csp] tetramer). Solvent molecules were added using COOT and 
checked manually. Simple solvent scaling was used for the apo-Csp1 model and 
Babinet solvent scaling was used for the Cu(I)-Csp1 model. All other computing 
used the CCP4 suite of programs". Five per cent of observations were randomly 
selected for the Rfree set. The models were validated using MolProbity”’ and data 
statistics and refinement details are reported in Extended Data Table 1. In a 
Ramachandran plot, 100% of residues are in most favoured regions for both 
models, and chain A of apo- and Cu(I)-Csp1 overlay with a root mean squared 
deviation of 0.42 A. In the structure of Cu(I)-Csp1 (SAJF), the Cu(I) ions referred 
to herein as Cul-Cul3 are numbered A1123-A1135 in chain A and the corres- 
ponding sites in chain B are numbered B1123-B1135. 

X-ray absorption near-edge spectroscopy. X-ray absorption near-edge spectro- 
scopy was conducted on beamline 124 at the Diamond Light Source, UK, using a 
Vortex-EX detector (Hitachi). X-ray fluorescence was measured on a fresh Cu(I)- 
Csp1 crystal between 8,948 and 9,030 eV with an acquisition time of 3 s per data 
point and a constant step of 0.5eV for the spectrum shown in Extended Data 
Fig. 2c (measurements were made on at least two other crystals giving very 
similar spectra). 

Construction of strain Acsp1/csp2 of M. trichosporium OB3b. A double 
mutant, strain Acsp 1/csp2, was constructed by sequential deletion of csp1 followed 
by csp2, from M. trichosporium OB3b, using a previously described method” with 
minor modifications. In each case, using genomic DNA from M. trichosporium 
OB3b as template, upstream and downstream regions (approximately 500 base 
pairs) of the target were amplified by PCR using primers (see Extended 
Data Table 2) 684AF/684AR and 684BF/684BR (for csp1) and 1592AF/1592AR 
and 1592BF/1592BR (for csp2). The resulting fragments were cloned into 
pK18mobsacB (ref. 52), which was then used to transform E. coli strain $17.1 
(ref. 53). The constructs were introduced into M. trichosporium OB3b by con- 
jugation as previously described™* except that nalidixic acid was not required to 
remove E. coli contamination. Single crossover mutants were selected on NMS 
plates containing kanamycin (12.5 ug ml‘). After cultivation in liquid medium 
without selection, double crossover mutants, with a deletion of the target gene, 
were selected by plating on NMS plates containing sucrose (7.5% w/v). Gene 
deletion was confirmed by PCR using primers (outside the cloned regions) 
684TF/684TR2 (for csp1) and 1592TF/1592TR (for csp2) and sequencing. 
sMMO activity of wild-type and mutant strains of M. trichosporium OB3b. 
The wild type and strain Acsp1/csp2 were grown in triplicate at 30°C in NMS 
medium (50 ml), containing 6 1M copper, in 250 ml flasks supplied with 20% (v/v) 
methane and agitated at 150 r.p.m. At late exponential phase (A549 nm = 0.8-1.0), 
cells were harvested by centrifugation at 5,000g for 15 min (22 °C), washed once 
and resuspended in NMS copper-free medium to a density of 1.5-1.6 (A540 am)- 
Cell suspension (22 ml) was transferred to fresh 250 ml flasks and incubated with 
20% v/v methane. Approximately every 3h, samples (1 ml) were withdrawn and 
used to measure culture density (As4o nm) and sMMO activity. After 12.75 and 
19.25h, flasks were flushed with air and re-supplied with methane. To estimate 
sMMO activity, cells (approximately 0.5 ml) were incubated with a few crushed 
crystals of naphthalene at 30 °C for 30 min before addition of 40 jl of tetrazotized 
o-dianisidine (10 mg ml~'). Immediate development of an intense purple colour 
indicated sMMO activity”’. To quantify sMMO activity, 150 ul of cell suspension 


was centrifuged at 6,000g for 10 min (22 °C) and the cell pellet resuspended in 1 ml 
of 10 mM phosphate buffer pH 6.8 containing 10 mM formate. Crushed naphtha- 
lene crystals were added and the reaction initiated by addition of 25 ll of tetra- 
zotized o-dianisidine (10 mg ml” '). The mixture was shaken vigorously and the 
absorbance at 528 nm, corresponding to the formation of naphthol, was moni- 
tored for 15 min at 30 °C. Activities were normalized to cell density. 
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Extended Data Figure 1 | Purification of proteins from M. trichosporium 
OB3b and the amino-acid sequence of Csp1. a, The copper content of anion- 
exchange fractions (NaCl gradient shown as a dashed line) and the SDS-PAGE 
analysis of selected fractions (1 ml) from the purification of soluble extract 
from M. trichosporium OB3b cells. The band just below the 14.4 kDa marker, 
indicated with an arrow, is present. Fraction 32 was judged to have the lowest 
level of contaminating proteins and was further purified by gel-filtration 
chromatography on a Superdex 75 column (b). Csp1 is present in the peak that 
elutes at ~11 ml and contains considerable copper (see Fig. 1c). ¢, The amino- 
acid sequence of Csp1 showing the predicted Tat leader peptide (the first 24 
residues of the pre-protein) in italics. The 13 Cys residues are highlighted in 
yellow, and His36 (cyan), Met40, Met43 and Met48 (magenta) are also 


indicated (the numbering of these residues refers to the mature protein). The 
CXXXC and CXXC motifs are underlined. The region in bold corresponds 
to the single tryptic fragment identified on two separate occasions in MS 
analysis, representing 11% sequence coverage of the mature protein (Mascot 
search of peptide mass fingerprint, expectation value = 1.9 X 10°). The 
sequence of this fragment was confirmed by liquid chromatography/MS/MS 
(data not shown). This is the only tryptic peptide from the mature protein that 
would be anticipated to be readily detected by MS (owing to either small mass 
or presence of Cys residues in all other theoretical tryptic fragments) and is 


unique to this protein among all proteobacterial protein sequences in the 
NCBiInr database. 
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Extended Data Figure 2 | Cu(I) binding to Csp1. a, UV-vis difference 
spectra upon the addition of Cu(I) to apo-Csp1 (5.32 1M) showing the 
appearance of S(Cys)>Cu(I) LMCT bands'*****. b, Plots of absorbance at 
250 nm (filled squares), 275 nm (filled circles) and 310 nm (open circles) 
against [Cu(I)]/[Csp1] ratio taken from the spectra in a. The absorbance rises 
steeply until ~11-15 Cu(I) equivalents but continues to rise, particularly at 
lower wavelengths, making binding stoichiometry difficult to determine 
precisely with this approach. Systems that bind multiple Cu() ions in clusters 
such as those found in metallothioneins, typically give rise to luminescence at 
around 600 nm (refs 10, 13). However, limited luminescence is observed at 
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600 nm during the titration of Cu(I) into Csp1 (data not shown). c, X-ray 
absorption near-edge spectrum of a fresh crystal of Cu(I)-Csp1 at 100 K. 

d, Plots of [Cu(BCS),]*~ formation against time after the addition of Cu(I)- 
Csp1 (0.93 1M) loaded with 11.8 equivalents of Cu(I) to 2,510 uM BCS either in 
the absence (dashed line) or presence (solid line) of 7.9 M urea. Cu(I) is 
removed faster in urea and is limited by the rate of Cu(I)-Csp1 unfolding 
(Extended Data Fig. 5i). The presence of urea has little effect on the end point 
for this reaction. Experiments in a, b and d were all performed in 20 mM 
HEPES pH 7.5 containing 200 mM NaCl. 
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Extended Data Figure 3 | Sequence comparison of Csp1 homologues in 
M. trichosporium OB3b. The M. trichosporium OB3b genome possesses two 
genes that code for Csp1 homologues, Csp2 and Csp3, having 58 and 19% 
sequence identity to Csp1, respectively. The predicted Tat leader peptides of 
Csp1] (MERRDFVTAFGALAAAAAASSAFA) and Csp2 (MERRQFVAAIGA 
AAAAASASRAFA) are omitted. The Cys residues (13 in Csp1 and Csp2 
and 18 in Csp3) are highlighted in yellow with CXXXC and CXXC motifs 
underlined. A CXXXC motif in an o-helix allows both of the Cys residues to 


coordinate the same Cu(I) ion (Fig. 3d, e), which is not the case for a CXXC 
motif. This is consistent with the observation that a synthetic peptide 
containing a CXXC motif binds a Cu,S, cluster via a four-helix bundle made 
from four peptides, with coordination involving only one Cys per peptide**”°. 
The alignment was produced using the T-coffee alignment tool”’. Asterisks 
indicate fully conserved sequence positions; the ‘’ and ‘” symbols indicate 
strongly and weakly similar sequence positions respectively. 
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Extended Data Figure 4 | sMMO activity of wild-type M. trichosporium 
OB3b and the Acsp1/csp2 strain. Purple colour, indicating sMMO activity, 
is evident at 19.25h in the Acsp1/csp2 strain (tubes 4-6), but not until 24.5h 
in the wild type (WT, tubes 1-3) when using a qualitative assay. When 
quantified spectrophotometrically at 27.75 h, the average sMMO activity in 
the Acsp1/csp2 strain (grey) is 1.8-fold greater (P = 0.04, one-tailed t-test) than 
that of the wild type (WT, white), as shown in the bar chart (mean + s.d. of 
three replicates). 
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Extended Data Figure 5 | The dependence on pH of competition between 
Csp1 and BCA for Cu(I), and far-UV circular dichroism spectra showing 
pH stability and unfolding of Csp1 in urea. a, Plots of [Cu(BCA),]*~ 
concentration against [Cu(I)]/[Csp1] ratio for the addition of Cu(I) to apo- 
Csp1 (2.38-2.56 [1M) in the presence of 103 uM BCA in 20 mM buffer (see 
Methods) at pH 5.5 (filled squares), 6.5 (filled circles), 7.5 (filled triangles), 8.5 
(open circles) and 9.5 (open squares) plus 200 mM NaCl. Equilibration is fast 
(<20 min) at pH 6.5 and higher and the data shown are from titrations of Cu(I) 
into apo-Csp1. At pH 5.5 equilibration is slower and the data are for mixtures 
incubated for 21 h. Also shown are results for mixtures of Cu(I) with apo-Csp1 
(3.31-3.67 uM) at pH 6.5 (b) and 9.5 (c) in the presence of 120 1M (filled 
squares), 300 LM (open circles), 450 [1M (stars), 600 LM (filled triangles) and 
900 1M (open squares) BCA, all after incubation for 15h. At lower BCA 
concentrations, Csp1 is able to compete effectively for Cu(I) in the pH range 
6.5-9.5, giving Cu(I) binding stoichiometries of 12-14 (see also Figs 3a and 4a). 
At pH 5.5, Csp1 competes less effectively with BCA for Cu(I), most probably 
because of the protonation of Cys ligands*’. This is consistent with greater 
competition by 600 LM BCA at pH 6.5 (b) compared with pH 7.5 (Fig. 4a). The 
stability of apo-Csp1 over the pH and time ranges used for experiments 
with BCA (and BCS) was determined using far-UV circular dichroism 
spectroscopy. The spectra of apo-Csp1 (solid lines) at pH (d) 5.5 (34.1 1M, 
0.43 mg ml), (e) 7.5 (36.5 uM, 0.46 mg ml!) and (f) 9.5 (32.6 1M, 
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0.41 mg ml ') are compared with those for samples incubated for 43 h (dashed 
lines), and for 3 h (dotted line) and 17 h (dashed/dotted lines) at pH 9.5. At pH 
9.5 and in the presence of higher BCA concentrations (c), Csp1 binds 
approximately one less equivalent of Cu(I), which must be because of changes 
in structure that are observed at this pH value (no change after 3 h but there is 
a decrease of ~ 15-20% o-helical content at longer incubation times, see 

f). However, the remaining sites bind Cu(I) more tightly (c) than at pH 7.5 
(Fig. 4a) because of deprotonation of the Cys ligands”. g, Far-UV circular 
dichroism spectra of apo-Csp1 (19.9 |1M, 0.25 mg ml’) in 20 mM HEPES pH 
7.5 containing 200 mM NaCl at 0, 30, 60, 120 and 240 min (solid lines) after 
the addition of urea (7 M) compared with the spectrum for apo-Csp1 in the 
same buffer but with no urea (dashed line). h, Far-UV circular dichroism 
spectra of apo-Csp1 (7.94 1M, 0.10 mg ml) as in g except that spectra were 
acquired at 0, 15, 30, 45 and 60 min (solid lines) after addition of urea (7 M); 
unfolding is significantly faster at lower protein concentrations and is 
consistent with the reaction with DTNB in urea being complete in 20 min at 
Csp1 concentrations <4 UM. i, Far-UV circular dichroism spectra of Csp1 
incubated with 14.0 equivalents of Cu(I) (19.9 [1M, 0.25 mg ml’) asin 

g but at 0, 60, 240, 360 and 480 min and 24h (solid lines) after addition of urea 
(7 M) compared with the spectrum for Cu(I)-Csp1 in buffer with no urea 
(dashed line). The arrow in g to i indicates how the spectrum changes with time. 
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Extended Data Figure 6 | Competition for Cu(I) between Csp1 and 
chromophoric ligands and the determination of the apparent average Cu(I) 
dissociation constant for Csp1 using BCS. a, Plots of [Cu(L)2]*~ 
concentration against [L] (BCA or BCS) after the incubation of Cu(I)-Csp1 
(2.59 tM) loaded with 10.4 equivalents of Cu(I) with different concentrations 
of BCA (filled circles) and BCS (filled squares) for 17h. b, Plots of 
[Cu(BCS),]*~ concentration against [Cu(I)] for apo-Csp1 (2.71 |1M) in the 
presence of 99.2 .M (open squares) and 248 UM (filled squares) BCS incubated 
with increasing concentrations of Cu(I) (0, 4.38, 11.0, 15.3 and 21.9 equivalents; 
data shown after 17h incubation). BCS competes much more effectively 
with Csp1 for Cu(I) than BCA, and [Cu(BCS),]*~ is stoichiometrically 
formed at 248 uM BCS. ¢, Plot of [Cu(BCS),]* concentration against the 


30 


d 
re) 1.0 
oo a 
ce 0.8 
55 0.6 
Sn 
of 
os 
#2 0.2 
oO 
rs 0.0 
0 3 6 9 


[Cu(|),,.] (M x 10°”) 


[Cu(I)]/[Csp1] ratio for mixtures of Cu(I) plus apo-Csp1 (2.70 1M) in the 
presence of 101 1M BCS (open squares) for 19 h. For comparison, the data from 
b (2.71 1M Csp]1 in the presence of 99.2 1M BCS for 17h) are also shown 
(filled squares). The data in a—c were all acquired in 20 mM HEPES plus 

200 mM NaC] at pH 7.5. d, Fractional occupancy of Cu(I)-binding sites in 
Csp1 (maximum value is 11.3 equivalents in this experiment) at different 
concentrations of free Cu(I) for the experiment shown in c. The solid line 
shows the fit of the data to the nonlinear Hill equation giving 

Kou = (1.3 £ 0.1) X 107” M (n = 2.7 = 0.2). Hill coefficients larger than 1 
indicate positive cooperativity for Cu(I) binding by Csp1. Confirmation, and 
the cause, of this effect will be the subject of further studies. 
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Extended Data Figure 7 | Cu(I) exchange between Csp1 and mbtin. UV-vis 
spectra of apo-mbtin (dashed lines) and at various times up to 360 min (thick 
lines) after the addition of either Cu(I)-Csp1 or Cu(I). Cu(I)-Csp1 (1.02 1M) 
loaded with 13.0 equivalents of Cu(I) was added to either 13.4 1M (a) or 
27.4 UM (c) apo-mbtin. Cu(D) alone (13.3 4M) was added to 13.4 uM (b) or 
27.1 uM (d) apo-mbtin. Plots of absorbance at 394 nm against time for a—d are 
shown in Fig. 4d. Mbtin from M. trichosporium OB3b has a Cu(I) affinity of 
(6-7) X 10°°M ' at pH 7.5 (determined’” using a logf, value of 19.8 for 
[Cu(BCS),]?~, but is an order of magnitude tighter if the more recent log, 
value of 20.8 (ref. 25) is used) and stoichiometrically removes Cu(I) from Csp1 


within 1 h. e, UV-vis spectra of Cu(I)-mbtin (2.71 1M, black line) immediately 
after mixing with apo-Csp1 (234 1M, green line) and after incubation under 
anaerobic conditions for 1h (blue line) and 20h (red line). Small increases in 
absorbance are observed because of the absorbance of apo-Csp1 at these 
wavelengths and precipitation. The latter was more of a problem at longer 
incubation times and the sample at 20h required filtering before running the 
spectrum shown. The small changes observed are not consistent with the 
formation of apo-mbtin”’. All experiments were performed in 20 mM HEPES 
pH 7.5 plus 200 mM NaCl. 
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N. gonorrhoeae MNRRQF--LGSAAAVSLASAASFARAHGHA----------- D----YHHHHDMQPAAASAY TAVROTAAHCLDAGOVCL-THELSLLTQGDTSMSDCAVAVROQM 
OB3b Cspl MERRDF--VTAFGALAAAAAASS--AFAGE-~--------- D----PHAGHKMSH--GAKYKALLDSSSHCVAVGEDCL-RHCFEMLAMNDASMGACTKATY DL 
OB3b Csp2 MERRQF--VAAIGAAAAAASASR--AFAQT-~--------- TOQGLAPGAPVH-HH--PAKYHALMETSAKCVSTGNECL-RHCFGMLSMNDTSMADCTKASYDL 
P. aeruginosa MERAINDPGNEDSS==ss=5>>5. PGSLLETDADALLGGAAVQAPE ESSesss-e= REQRQHAGD 
OB3b Csp3 MHVEAM--ISKH----------- PQARGQT----------- LGEDKV--ADLRHGIRLNLD 
S. coelicolor MPTTVNDLLRTY----------- PADLGGVD---------- LSEPTV--ADLTKCIRTDMD 
N. multiformis © MFLYTE----------------------- T----------- DONL LEAGGK--HVEADHLRLMMN 
R. leguminosarum MHHMS--~---~--~---~-------------- 7-5-5 n nnn LELGGE--HTKPPHFKLMM. 
R. metallidurans MIRPTV--QE------------------- N=S=sS=5=S55 LEEPDT--RKMT ALDMD 
S. enterica MQQ-E--------------------------------------- H---------------- RE LREDNV--EMMKHCIQLDMQ 
B. subtilis DO a a a Seo SSeS eee E. LEESVQ--HHLSGCIRLDRE 
L. pneumophila MTHQQ- 3-3-9 rr rrr rer Y---------------- D LHE ED6--NDLA SLDRD 
consensus i i 

N. gonorrhoeae LALCGAVHDLAAQN----SPLTRDAAKVELE K 

OB3b Cspl VAACGALAKLAGTN----SAFTPAFAKVVADVi ECD. 

OB3b Csp2 VAACAALETLSAVN----SSATPALAKTVYD ECDRFPQ-YSECKN QRVSS--- 

P, aeruginosa ADL RLAALLLERR----SPWAPAACELAARYAL DGDEP---LERE RPLLPA-- 

OB3b Csp3 AE ICVAAGS IASRAAGTEESILRTMLOTGAE ECRRHAGNHEHCRI RSATGLTH 

S. coelicolor ADVCTATAAVLSRHTGYDANVTRAVLOACATV! ECARHAGMHEHCR QELLAGLG 

N. multiformis AEICQTSLNFMLSG----SRFSPKVCGVCAEI SCEQLDG----MEE RKMAA-~-— 

R. leguminosarum AEICRTSAHFMLIG----SEHHKHVCRECAEI DCERVG----DMQS RKMAA-~~ 

R. metallidurans AGIANLAASYMLRN----SEFAPLVCEDCAE ECERY DH--WHEQE LKMTA--- 

S. enterica AATCRLAAQFMALE----SEYSQKLCRLCADI ECARHDH--DHCOQN: LKMAA--— 

B. subtilis ADICALAVKAMQTD----SPFMKEICALCADI ECGKHDH--DHCQ RSMAA--- 

L. pneumophila AAICALAIEMMARN----SPFAKEICALCAKI ECSKHQH-MEHCQ EKM-A--- 

consensus « ba a 

Extended Data Figure 8 | Sequence comparison of Csp homologues from C11025; P. aeruginosa sequence: ORF PA96_2930, UniProt accession X5E748 


diverse bacteria. Homology searches show that Csp homologues are encoded 
in the genomes of diverse bacteria. Multiple sequence alignment of the three 
M. trichosporium OB3b proteins (OB3b Csp1, OB3b Csp2 and OB3b Csp3) 
with a selection of these proteins, including one member (from Neisseria 
gonorrhoeae) that also possesses a putative Tat signal sequence (underlined), 
shows that the Cys residues (highlighted in yellow) are highly conserved. The 
alignment was produced using the T-coffee alignment tool’’. Asterisks 
indicate fully conserved sequence positions; the ‘? and ‘” symbols indicate 
strongly and weakly similar sequence positions respectively. N. gonorrhoeae 
sequence: open reading frame (ORF) NGAG_01502, UniProt accession 
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(PDB accession number 3KAW); Streptomyces coelicolor sequence: ORF 
S$CO3281, UniProt accession Q9X8F4; N. multiformis sequence: ORF 
NmulI_A1745, UniProt accession Q2Y879 (PDB accession number 3LMF); 
Rhizobium leguminosarum sequence: ORF RLEG_20420, UniProt accession 
WOIHZ3; Ralstonia metallidurans sequence: ORF Rmet_5753, UniProt 
accession Q1ILB64; Salmonella enterica sv. Typhimurium sequence: ORF 
STM14_1521, UniProt accession DOZVJ6; Bacillus subtilis sequence: ORF 
BSU10600, UniProt accession 007571; Legionella pneumophila sequence: ORF 
LPE509_p00081, UniProt accession M4SK87. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 

Space group 

Cell dimensions 
a, b, c (A) 


a, B, y(°) 
Resolution (A) 


Rmerge (Y%) 

I/ol 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections 

Rwork/ Rfree 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 
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Apo-Csp1 
P2i 


40.9, 105.9, 48.7 
90.0, 112.5, 90,0 


44.95-1.50 
(1.53-1.50)* 
7.0 (50.5) 


10.9 (2.6) 
99.7 (99.8) 
3.7:(09) 


1.50 
60896 (3056) 
12.2/17.9 


3209 


27.0 


0.020 
1.8 


*Highest resolution shell is shown in parenthesis. 


Cu(1)-Csp1 
P? 


44.4, 41.4, 53.1 
90.0, 92.6, 90.0 


53.06-1.90 
(1.95-1.90) 
8.7 (43.3) 


5.6 (2.1) 
99.1 (97.1) 
2.8 (2.4) 


1.90 
15212 (990) 
19.8/23.2 


1575 
28 
116 


40.2 
41.4 
47.7 


0.016 
16 
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Extended Data Table 2 | Primers used for cloning Csp1 and making the Acsp1/csp2 M. trichosporium OB3b strain 


Primer 
Csp1_F 
Cspl_R 
684AF 
684AR 
684BF 
684BR 
1592AF 
1592AR 
1592BF 
1592BR 
684TF 
684TR2 
1592TF 
1592TR 


Sequence (5’ to 3)” 
GCGCATATGGGAGAGGATCCTCATGC 
GCGCCATGGTCAGGCGGCGACCTTATGGC 
ATATCCCGGGTAAGGGTGAAG ACCGCCATCAG 
GATCGTCGACACGACGGACGCAACCTAAAC 
GATCGTCGACTAAGGTCGCCGCCTGAGTTC 
GATCAAGCTTCGCGCTCGCGTCCGTATTC 
CATCAAGCTTCGGTGCGCGACATCATCCTC 
CATCCTGCAGTGGTCGTTCCTCTCGTGTTC 
TAATGGATCCCAGCGCGTGTCGAGCTGAAC 
ATTAGAATTCGCGGAGCCCGCGTGGAAAG 
CACATGCAGGCGGTAGATCG 
CGACCAGCAGGATCATCAG 
ACCCTTCTCACGCAATCCC 
ACGTTGATCGGCCTCACTC 


* . . . . 
Introduced restriction sites are underlined when relevant. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature14554 


Corrigendum: A diverse range of 
gene products are effectors of the 


type I interferon antiviral response 


John W. Schoggins, Sam J. Wilson, Maryline Panis, 
Mary Y. Murphy, Christopher T. Jones, Paul Bieniasz 
& Charles M. Rice 


Nature 472, 481-485 (2011); doi:10.1038/nature09907 


We have recently discovered that the WNV-GFP stock used in the 
data set (Fig. 2 and Supplementary Table 8 of the original Letter) for 
West Nile virus (WNV) in this Letter was actually Venezuelan equine 
encephalitis virus (VEEV-GFP). The error has been tracked to a tech- 
nical mistake made during the virus production process. This error 
affected the panels labelled WNV in Fig. 2, Supplementary Fig. 9, and 
Supplementary Table 8 of the original Letter; the Supplementary 
Information to this Corrigendum contains the corrected WNV data 
sets. This error also affected figure labels in our later work" (see the 
Corrigendum to ref. 1). 


1. Schoggins, J. W, etal. Pan-viral specificity of IFN-induced genes reveals new roles for 
cGAS in innate immunity. Nature 505, 691-695 (2014); corrigendum Nature 
http://dx.doi.org/10.1038/nature14555 (2015). 


Supplementary Information is available in the online version of the paper. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/naturel14555 


Corrigendum: Pan-viral specificity 
of IFN-induced genes reveals new 


roles for cGAS in innate immunity 


John W. Schoggins, Donna A. MacDuff, Naoko Imanaka, 
Maria D. Gainey, Bimmi Shrestha, Jennifer L. Eitson, 

Katrina B. Mar, R. Blake Richardson, Alexander V. Ratushny, 
Vladimir Litvak, Rea Dabelic, Balaji Manicassamy, 

John D. Aitchison, Alan Aderem, Richard M. Elliott, 

Adolfo Garcia-Sastre, Vincent Racaniello, Eric J. Snijder, 
Wayne M. Yokoyama, Michael S. Diamond, Herbert W. Virgin 
& Charles M. Rice 


Nature 505, 691-695 (2014); doi:10.1038/nature12862 


In this Letter, we carried out bioinformatic analyses on interferon- 
stimulated gene screening data sets for multiple viruses, including 
a data set for West Nile virus (WNV) (Supplementary Table 8 in 
ref. 1). We recently discovered that the WNV-GFP stock used in 
our 2011 study’ was actually Venezuelan equine encephalitis virus 
(VEEV-GEP). The error has been tracked to a technical mistake made 
during the virus production process. Several data sets in this Letter are 
therefore mislabelled. In Fig. 3a and in all panels of Extended Data 
Fig. 2a, “WNV’ should be “VEEV’. The original figure legends remain 
valid, as do all the other figures in this Letter. One conclusion of the 
Letter highlighted differences in interferon-stimulated gene specificity 
between positive-sense and negative-sense RNA viruses. Since VEEV 
and WNV are both positive-sense, the stated conclusions remain 
unchanged; all other results and conclusions are also unchanged. 


1. Schoggins, J. W. et al. A diverse range of gene products are effectors of the type | 
interferon antiviral response. Nature 472, 481-485 (2011); corrigendum Nature 
http://dx.doi.org/10.1038/nature14554 (2015). 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature14608 


Corrigendum: Greenland 
supraglacial lake drainages 
triggered by hydrologically 
induced basal slip 


Laura A. Stevens, Mark D. Behn, Jeffrey J. McGuire, 
Sarah B. Das, Ian Joughin, Thomas Herring, 
David E. Shean & Matt A. King 


Nature 522, 73-76 (2015); doi:10.1038/nature14480 


In Fig. 2d and f of this Letter, the labels ‘10° N m’ should read ‘10!” Nm’. 
(Basal moment is correctly reported in Extended Data Table 1.) 
Figure 2 has been corrected online. 
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ILLUSTRATION BY THE PROJECT TWINS 


TOOLBOX 


SEE HOW 


THEY RUN 


Software tools that track how animals move are helping researchers to do 
everything from diagnosing neurological conditions to illuminating evolution. 


BY BOER DENG 


Pp alaeontologist Stephen Gatesy wants to 


bring extinct creatures to life — virtu- 
ally speaking. When he pores over the 
fossilized skeletons of dinosaurs and other 
long-dead beasts, he tries to imagine how they 
walked, ran or flew, and how those movements 
evolved into the gaits of their modern descend- 
ents. “I'm a very visual guy,’ he says. 
But fossils are lifeless and static, and can 
only tell Gatesy so much. So instead, he relies 


on XROMM, a software package that he devel- 
oped with his colleagues at Brown University 
in Providence, Rhode Island. XKROMM (X-ray 
Reconstruction of Moving Morphology) bor- 
rows from the technology of motion capture, 
in which multiple cameras film a moving 
object from different 
angles, and markers on 
the object are rendered 
into 3D by a computer 
program. The difference 
is that XROMM uses not 


> NATURE.COM 

For more on scientific 
software, apps and 
online tools, visit: 
nature.com/toolbox 
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cameras, but X-ray machines that make videos 
of bones and joints moving inside live creatures 
such as pigs, ducks and fish. Understanding 
how the movements relate to the animals’ bone 
structure can help palaeontologists to determine 
what movements would have been possible for 
fossilized creatures. “It’s a completely different 
approach” to studying evolution, says Gatesy. 
XROMM, released to the public in 2008 as 
an open-source package, is one of a number 
of software tools that are expanding what 
researchers know about how animals and >» 
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» humans walk, crawl and, in some cases, 
fly (see ‘Movement from inside and out). 
That has given the centuries-old science of 
animal motion relevance to a wide range of 
fields, from studying biodiversity to design- 
ing leg braces, prostheses and other assistive 
medical devices.“We're in an intense period 
of using camera-based and computer-based 
approaches to expand the questions we can 
ask about motion,” says Michael Dickinson, 
a neuroscientist at the California Institute of 
Technology in Pasadena. 

To use and develop effective software, 
however, scientists must learn how to adapt 
broad, open-source tools to their own needs 
— and when to build their own. 


AVISUAL HISTORY 
The boom in motion-tracking tools has come 
about in part because of improvements in what 
researchers can see and measure. The first 
studies of animal and human motion, dating 
back to Aristotle, relied on naked-eye observa- 
tions, anatomy and detailed pictures drawn by 
hand. In the nineteenth century, the science of 
biomechanics was boosted by photography — 
perhaps most famously in a series of images 
of a galloping horse taken by British photo- 
grapher Eadweard Muybridge, and published 
in his Animal Locomotion collection in 1887. 

Higher-speed cameras eventually improved 
what could be captured. But movement studies 
still needed a person to look through the results 
frame by frame, laboriously tracing the arc of 
each step, arm swing or wing flap to extract 
information about angles and forces. Much of 
that tedium can now be relieved by computers 
or other measuring tools. But such tools are 
often expensive, and even today, many research- 
ers do without them. Gatesy recalls a graduate 
student's surprise at the low-tech approach that 
was used to study gait in rodents a few years ago: 
“Tt wasnt uncommon just to dip their feet into 
some ink, have them leave some tracks and take 
measurements from those,’ he says. 

Lately, however, scientists have been coming 


up with methods that are much more sophis- 
ticated without being too expensive. In July, 
developmental biologists Richard Mann and 
César Mendes at Columbia University in New 
York City and their colleagues published a 
paper on MouseWalker: a system they have 
built to automatically analyse changes in a 
mouse’s gait (C. S. Mendes et al. BMC Biol. 13, 
50; 2015). It involves an inexpensive set-up in 
which a mouse walks on a transparent surface 
over a high-speed camera that records the ani- 
mal’s footfalls. An analytical technology called 
machine vision allows the MouseWalker soft- 
ware to discern details such as the position of 
each step relative to the mouse’ body. 

Mendes says that this information can be 
used to detect when something goes wrong 
with gait, as can happen with the onset of neu- 
rological illnesses such as Parkinson's disease. 
MouseWalker was adapted from FlyWalker, 
a system that Mendes and his team helped to 
develop to let neuroscientists track how fruit 
flies walk after their neurons have been manip- 
ulated. Both MouseWalker and FlyWalker are 
open source: the authors hope that making the 
software available for free will help to attract 
users who can add parameters that they had 
not thought of. 


BUILDING USER COMMUNITIES 
The desire to share tools is common to many 
developers, so motion-tracking software is find- 
ing applications in a number of fields — some- 
times in unexpected ways. “One of the things 
we hope for is that people will use what we 
develop and go in a new direction with it,” says 
Jen Hicks, an engineer at Stanford University 
in California, who helps to manage OpenSim 
— an open-source software package that allows 
users to model joints, muscles and how they 
move. OpenSim has more than 20,000 users, 
and part of Hicks’s job is to organize workshops 
and tutorials to guide this growing community. 
The OpenSim community serves as an 
exemplar of what newer programs such as 
XROMM or MouseWalker could become. The 
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Movement from inside and out 


Scientists monitoring animal motion use a 
variety of programs to automate the process. 
@ MouseWalker (go.nature.com/hugtxa) 

is a gait-tracking system that can help 
researchers studying the connection 
between neuroscience and movement. 

The open-source software package was 
released in July. Previously, the developers 
collaborated on FlyWalker (go.nature.com/ 
jfwgto), a tool for measuring fly walking. 

©® Xray Reconstruction of Moving 
Morphology, or XROMM (go.nature. 
com/58bn2s), helps researchers to visualize 


animals’ bones and joints as 3D moving 
skeletons. XMA Lab (go.nature.com/ 
wz1msi), the latest version of the XROMM 
software, was released in December 2014. 
® OpenSim (go.nature.com/1ulnpq) 

iS an open-source program that allows 
researchers to model muscles, bones 

and the forces that act on them. Some 
researchers have used it to simulate the 
outcomes of surgery or to test hypotheses 
about movement pathologies such as those 
that affect people with cystic fibrosis or 
Parkinson’s disease. 


software models musculoskeletal systems, and 
researchers have used it to simulate everything 
from the potential outcomes of surgery to the 
muscular forces of goats. Since the first version 
of OpenSim was released in 2007, the package 
has gone through dozens of upgrades that have 
added features and improved the algorithms 
used for calculations. It has been down- 
loaded more than 100,000 times. “It’s amazing 
how much the community has grown,” says 
mechanical engineer Katherine Steele of the 
University of Washington in Seattle, who first 
began using OpenSim while studying cystic 
fibrosis as a graduate student at Stanford. 
Serving an ever-larger crowd requires 
careful planning to make the program acces- 
sible, says Hicks. Through grants from the 
US National Institutes of Health, she and her 
colleagues keep manuals up to date with the 
new releases. Ensuring that the software can 
be tailored to a researcher’s particular needs 
has helped new users to embrace it, she says. 


THE LIMITS OF BROAD PLATFORMS 

XROMM’s developers are in the middle of 
building up the infrastructure to make the 
software accessible to a wider community, 
for instance setting up a site to host the new- 
est open-source version, XMA Lab, which 
became available in December 2014. The 
team has tried to make the latest versions of 
the software easier for new users. For example, 
says Elizabeth Brainerd, a colleague of Gatesy 
at Brown, “There used to be about 20 pieces of 
information you had to keep track of? includ- 
ing items such as calibration measurements. 
“But now it’ all integrated”. 

It is important not to make things too easy, 
says Steele: if the software does too much of the 
work, there is a risk that the researchers will mis- 
understand the data that it spits out. However, as 
an open-source program develops, understand- 
ing its architecture can get very complicated. 
“Sometimes the software can get so big that it 
becomes black-box-ish. Then it might be better 
for you to build your own,’ she says. 

Dickinson agrees, and says that sometimes, 
modifying open-source tools is not enough. 
“As science is becoming more quantitative, 
were all working on finer slices of the pie,’ he 
says. “If you only got to use a microscope that 
someone else built, so to speak, you won't be 
able to get as far.” 

Regardless of what tools are available, 
researchers intend to keep expanding the 
applications of motion tracking. Hicks antici- 
pates seeing more people using the tools to 
explore neural control and robotics designs. 
And she expects the software to keep improv- 
ing. “We're finding ways to learn from even 
messier motion data, like from accelerometers 
in your phone,’ she says. “Bringing together 
more machine learning and biomechanics — 
that will be the next step?” m 


Boer Deng is a journalist in Washington DC. 
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Mind wide open 


An innovative US National Institutes of Health programme 
aims to expose junior scientists to different career paths. 


BY PAUL SMAGLIK 


any more. With fewer than one-third of US 
life-science PhD graduates destined for ten- 
ure-track positions, most graduate students and 
postdoctoral researchers need to prepare them- 
selves for life outside the academic laboratory. 
In 2013, the US National Institutes of Health 
(NIH) launched the Broadening Experience 
in Scientific Training (BEST) programme, 
after being faced with a number of reports 


[ern are not just for undergraduates 


suggesting that the US graduate-education 
system trains scientists for faculty positions 
that do not actually exist (see go.nature.com/ 
buk6km). The initiative began as a five-year 
pilot to offer biomedical graduate students and 
postdocs supplemental skills to help to prepare 
them for non-tenure-track career options. 
Since its launch, BEST has offered enhanced 
career training, including internships, to about 
10,000 graduate students and 600 postdocs. 
The 17 universities that received NIH grants 
to participate in the pilot programme act as 


a collective laboratory, exploring different 
approaches to redefine graduate training and 
craft internships for highly trained young 
scientists. The approaches may include part- 
nerships with industry and other sectors; 
participant institutions are sharing their 
experiences to determine best practices. So far, 
BEST trainees have engaged in projects such 
as working in the universities’ technology- 
transfer offices, teaming up with professional 
science writers and lobbying state legislature. 

Patricia Labosky, who is the programme 
leader for BEST at the NIH in Bethesda, 
Maryland, says that neither career training 
nor internships are an entirely new endeavour 
for graduate schools. However, she adds that 
the agency’s approach is innovative — it could 
transform graduate internships and career 
training from an ad hoc approach to a more- 
systematic, data-driven one. 

The best of the BEST projects will probably 
be adopted by more universities once the 
pilot expires in 2018, says Kathy Gould, a 
biologist who runs the BEST programme at 
Vanderbilt University in Nashville, Tennessee. 
“What we're seeing is an experiment in pro- 
gress,’ she says. Labosky predicts that market 
forces could speed up adoption because BEST 
universities use the initiative as a student- 
recruiting tool. 


FACULTY FIRST 

Faculty-member buy-in to the programme is 
essential because many principal investiga- 
tors (PIs) look at students’ or postdocs’ time 
away from the lab as detrimental — especially 
if those young scientists are funded under the 
PI’s grant. But some have come around, says 
Ambika Mathur, who teaches paediatrics at 
Wayne State University in Detroit, Michigan, 
and is dean of its graduate school. When her 
institution surveyed science faculty members 
to gauge reaction to off-site internships for 
trainees, most PIs said that they were favour- 
ably or very favourably inclined. 

One solution to objections from PIs is to 
fund trainees who are on internships from 
departmental money or training grants, which 
are not tied explicitly to research outcomes. 
Some BEST universities reimburse PIs for 
the time that their trainees spend away from 
the lab. But such a change does not hap- 
pen quickly or easily. “Shifting trainees off 
research grants is going to take a long time to 
accomplish,” says Nael McCarty, who man- 
ages Emory University’s BEST programme in 
Atlanta, Georgia. And tweaking funding 
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> sources alone will not address PIs’ need for 
young scientists to get experiments done. “Our 
entire careers are resting on the backs of our 
trainees,” he points out. 


SPLIT THE DIFFERENCE 
Emory is quelling some faculty members’ 
objections by casting internships as separate, 
independent projects with flexible hours. The 
university also sets a low bar on the schedule 
and time requirements for internships. BEST 
advisers first sign off on their students and 
postdocs joining the programme, with the 
understanding that the trainees will spend at 
least 50 hours each semester or during their 
summer academic break away from the lab. 
Those hours could be compressed into a 
month or even a week, or they could be spread 
out over a semester or a year. Once trainees 
find potential placements, they negotiate their 
internship schedules with their Pls. 

This negotiation was a fraught process for 
Chelsey Ruppersburg. As a PhD student, also 


at Emory, she created a public-policy intern- 
ship for herself in Emory’s office of govern- 
ment and community affairs that required 
20 hours a week at the Georgia State Capitol, 
where she advocated to the state legislature on 
behalf of the university. 

“Tt started with a frank conversation with 
my PI — that I would be out of his lab dur- 
ing traditional hours,’ she says. “I was going 
to do everything on my end to make sure this 
wouldn't harm my PI and my work in his lab” 
They agreed that she could spend two or three 
days each week at the Capitol from January to 
April during Georgia's legislative session. She 
made up lab hours at nights and on weekends, 
hustling to finish her doctoral dissertation 
on cell biology while learning the ropes of 
advocacy. The extra work paid off: Ruppers- 
burg started a post last month as a fundraising 
staffer in the political campaign for US senator 
Johnny Isakson. 

The PI is not the only one who can pose 
an obstacle to crafting a useful internship. 


BEST CALCULATIONS 


The rule of threes 


Douglas White knows how to work the 
numbers — and how to make them work 
for him. About halfway through his doctoral 
programme in biomedical engineering at 
the Georgia Institute of Technology (Georgia 
Tech) in Atlanta, he realized that the odds of 
landing a tenure-track position were not on 
his side. To prepare for a career with better 
chances of employment, White turned 

to the US National Institutes of Health’s 
Broadening Experience in Scientific Training 
(BEST) programme. 

Today, thanks to three internships, White 
works as a project manager at Takeda 
Pharmaceuticals, a Japan-based drugmaker 
with a presence in Atlanta, Georgia. “People 
always talk about how internships lead to 
job opportunities, but | didn’t believe it,’ 
he says. 

BEST offered flexible internship 
opportunities that allowed him to 
experience different paths over about 
18 months. First, he completed a writing 
apprenticeship sponsored by the 
philanthropic W. M. Keck Foundation in Los 
Angeles, California, which paired him with 
two professional science writers, put him in 
touch with about a dozen science-writing 
students around the country and sent him 
to a scientific conference where he reported 
on its proceedings and was critiqued by 
professionals. 

He learned that he loves writing, but 
did not want to pursue it as a career. 
Equally important, he learned how to tailor 
his message for people from different 


backgrounds and with various levels of 
scientific knowledge. 

Next, he decided to look into the 
government sector, and the Atlanta BEST 
office came through with an application for 
an internship at the US Defense Forensic 
Science Center, about 24 kilometres from 
Georgia Tech in Forest Park. 

During the interview, White was excited to 
learn that the centre used a technology he 
was interested in. However, after two months, 
he realized that the hours out of the lab 
were cutting into his dissertation research 
and that he was spending little time on the 
aspects of the work that were most relevant 
to his training and research needs. 

The experience gave him a crash course 
in negotiating an early exit. He also learned 
that he needed to consider work-life 
balance, so when he landed an interview 
for his next internship with Takeda, he told 
them that he was scheduled to defend his 
dissertation in a few months and was getting 
married a week after that. He won the 
internship, defended his thesis and left for 
his honeymoon — and when he returned, 
his department had been restructured 
and his new boss offered him a job as a 
project manager. 

White says that the three internships 
changed his life by allowing him to explore 
multiple career options in a relatively 
short amount of time — and landed him 
a full-time job in the process. “I would not 
be where | am if it wasn’t for the BEST 
programme,’ he says. P.S. 
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Finding employers who offer the kind of 
flexibility that graduate- and postdoc-level 
interns require can be tricky, says Gould. She 
advises trainees to think carefully about what 
they want from an internship, and she asks 
prospective partners to plan projects that are 

appropriate for a 


“It started graduate student or 
with a frank postdoc’s skill level 
conversation and training needs. 
withmyPI—that _5°™€ People in 
Iwouldbeout of —™4ustry, on the 

A surface, are very 
his lab during enthusiastic,” she 
oe says. But they often 


prove unable to 
write a job descrip- 
tion for a PhD-level scientist. 

To sidestep these challenges, some BEST 
trainees create internships through their 
own university, like Ruppersburg did. Most 
universities can offer opportunities in areas 
such as science writing and intellectual- 
property management. Many BEST schools 
have found it easier to place students in intern- 
ships on campus than offsite, especially if they 
are not located in or near a major technology 
hub, says Labosky. 

For example, the BEST programme at 
New York University (NYU) works with the 
university’s technology-transfer office. A formal 
internship charges the trainee with drafting a 
business plan around a particular piece of 
technology. A less-formal option pairs gradu- 
ate students and postdocs with companies to 
write marketing summaries, gather competi- 
tive intelligence and perform outreach. Neither 
option requires fixed hours, but before gradu- 
ate students or postdocs can participate, they 
must complete a technology-commercializa- 
tion course. At this point, about 40 students 
and postdocs have interned with NYU's tech- 
transfer office in one of these capacities. 

Other approaches to internships will arise 
as the BEST programme adapts to the needs of 
trainees, PIs and internship sponsors, says Keith 
Micoli, director of NYU’s medical school and 
co-PI of its BEST grant. Short internships can 
be effective if they help a trainee to choose or 
rule out a particular career pathway, he adds. 
By encouraging trainees to explore different 
options — even through simple things such 
as job shadowing — these programmes could 
help to ease the bottleneck of graduate students 
and postdocs who do not know what they 
want to do once they complete their pro- 
gramme, he adds (see “The rule of threes’). “One 
of the most frustrating things I see; he says, “is 
graduate students who complete their PhDs and 
say, ‘I suppose I should do a postdoc and figure 
out what I need to do.” 

As the BEST model expands, Micoli and 
others hope to hear that less often. m 


Paul Smaglik is a freelance writer in 
Milwaukee, Wisconsin. 


TURNING POINT 
Anja Rammi 


Ecosystem modeller Anja Rammig started in 
June as an assistant professor at Germany's 
Technical University of Munich (TUM), which 
in 2012 adopted a tenure-track scheme. 


Was there a pivotal moment in your career? 

I was working on my undergraduate degree in 
zoology, and I was so excited to learn it was pos- 
sible to do computer simulations of ecosystems 
that I almost changed my course of study. My 
adviser suggested that instead, I pursue a PhD 
thesis focused on computer modelling. It was 
the most important move in my scientific life. 


How did you come to specialize in forests? 
During my PhD programme at the Swiss 
Federal Institute of Technology in Zurich, I 
worked with researchers at the Institute for 
Snow and Avalanche Research. Switzerland 
uses forests as avalanche protection, and 
researchers had collected data after a strong 
windstorm in 1990 that had killed many trees 
in the country. They wanted to learn how long 
it would take for the forest to regenerate. It was 
my first experience with modelling, and it con- 
vinced me that I wanted to continue this type 
of work but with a focus on global problems. 


What was your seven-year experience at 

the Potsdam Institute for Climate Impact 
Research (PIK) in Germany like? 

Working at this world-renowned institute was 
my introduction to the big world of science pol- 
icy. In the first month, I began estimating the 
large-scale die-off of the Amazon rainforest as 
aresult of climate change. I gave presentations at 
the World Bank and other organizations. 


Did you get any coaching on communication? 
Thad discussions with colleagues about how to 
communicate the science, but I got no specific 
coaching. For me, science communication was 
very new, and in my first two years at PIK, it 
was difficult to do. But I learned what audi- 
ences expected and what level of information 
worked best. It was the culture of the institute 
to learn by watching more-experienced col- 
leagues. I determined that it is really impor- 
tant to read a lot to prepare for questions from 
scientists, stakeholders or politicians — and to 
know the Intergovernmental Panel on Climate 
Change reports almost by heart. 


How is it being a woman in sucha male- 
dominated field? 

PIK was trying to increase the number of 
women in high positions; they were very fair 
when it came to parental leave and work-life 


CAREERS 


balance. I was in the group at PIK with the 
highest percentage of women, incidence of 
maternity leave and employees with children. 
My last three years there, I had a female boss 
and worked with five other women. Last year, 
I applied for a position in Germany that aimed 
to attract female applicants. I was pregnant at 
the time, and the date for my presentation was 
my due date. Ironically, they wouldn't move my 
presentation date, so they didn't consider me. 


Can you describe TUM’s tenure-track scheme? 
The criteria for how you will be evaluated — 
on research, teaching and public engagement 
— are clearly spelled out. Research criteria 
include developing methodologies and con- 
cepts, securing external funding and showing 
that you are building an international reputa- 
tion. It’s not like the impression you may get at 
other institutions: that is, that the process is not 
transparent or lacks defined criteria. Tenure- 
track professorships in all disciplines have all 
the same criteria. 


What landed you the position at TUM? 

I really wanted my own research group to keep 
studying the Amazon's ecophysiology and how 
it might change. I’m interested in modelling 
it with data from experimental studies. Fortu- 
nately, while I was at PIK, I established a huge 
network of collaborations and connections, 
especially with Brazilian scientists. I am on 
the scientific committee of a large collabora- 
tion to build a big experiment in the Amazon 
rainforest that will test the impact of increas- 
ing carbon dioxide. I think my connections 
helped me, even though competition for the 
job at TUM was high. = 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 
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Ua SCIENCE FICTION 


BY CARIE JUETTNER 


¢C C ang! I almost had it.” 
D “No, you didn't. You were a 
mile away.’ 

“Was not. It was a good one too. It 
was tiny.” 

“That’s a myth, you know,” 
I said, swiping at a medium- 
sized green one with my 
baseball cap. “Size doesn’t 
matter.” 

“Yeah it does, Kat.” 
Jeremy put his net 
away and got out a 
pair of chopsticks. I 
rolled my eyes. “The 
small ones are always 
worth more. Haven't 
you ever noticed that 
the big slow ones that 
are so easy to catch are ¢ 
never worth more than 
an hour or two?” 

“So? Pm starting to 
think there are no valuable 
ones anymore. I think they 
went extinct or something” 

“No, theyre out there. My 
friend Jon caught a 17-year one 
once!” 

“Oh yeah? Did you actually see it?” I 
swiped my foot through the tall grass, try- 
ing to stir something up. 

“No, but I believe him? 

“Seventeen years, huh? So does Jon look 
different now?” 

A small iridescent blue one flew by. 
Jeremy made a flailing jab at it and missed. 
“No. It got away before he could swallow it” 

“Pshh. Then he was lying,” I made an 
attempt at the blue one, but it was already 
out of my reach. “There’s nothing bigger 
than a year out there anymore. Sometimes 
I think there’s nothing bigger than a week” 

“If you really think that, then why do you 
even catch?” 

I shrugged. “Every little bit helps” 

We were quiet for a while. A thick-bodied 
brown buzzer lifted clumsily out of the grass 
and I grabbed it without even using my cap. I 
flipped it over. Three hours. Turning my back 
to Jeremy, I popped the whole thing in my 
mouth at once. I bit 
down and the buzz- 
ing stopped as the 
earthy metallic taste 
flooded my tongue. I 
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TIME FLIES 


The catch of a lifetime. 


swallowed and waited, eyes closed, for the 
sensation I knew would come. I had to con- 
centrate or I would miss it. A quick burst of 
energy and warmth passed through me, and 
then it was gone. I opened my eyes. 

“You know, Jeremy said, “my grandfather 
hated catching. He wouldn't let me do it 
when he was around. He said we should be 
happy with the time we're born with.” 

I scoffed. “And where is he now?” 

“He's dead.” 

I nodded. It was starting to get dark. 
Thered be more coming soon, but they'd be 
harder to see. 

“How long do you think you'll do it?” I 
asked. 

“Do what?” 

“Catch? 

Jeremy shrugged. “Not long. I just want 
to build up little more, you know? Bank it. 
Then [ll stop. I mean, I’m not going to do 
it for ever?” 

“Me neither” I knewit was rude to ask, but 
I couldn't help myself. “How much do you 
have? You know, in the bank?” 
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Jeremy examined his shoes. “A good 
amount. I’m comfortable.” 
Inodded. 
21 “Yeah,” he said. “Just one more 
8 good catch, and I'll hang up my 
= net.” He smiled at me. “And 
my chopsticks.” 
2. I smiled back. 


I heard a high-pitched 
buzzing behind me. I turned 
around and saw a minuscule 
flyer heading right for us. I 
pulled my cap back, but Jer- 
emy was faster. He dropped 
his chopsticks and clasped. 
his bare hands around it, 
trapping it inside. “I got 
it!” he yelled. “I actually 
got it!” He whooped and 
jumped up and down. “It’s 
the smallest one I’ve ever 
seen!” 
“So look at it already. 
What is it?” I crossed my 
arms to keep them from shak- 
ing. What if the myth was true 
after all? 

Jeremy moved his hands care- 
fully until he had the tiny creature 
pinned firmly between his fingers. He 
flipped it over and peered at the underside. 
For just a moment, his face fell. Then his 
smile was back. He popped the thing in his 
mouth, swallowed loudly and stood still for 
along time. I let him have his moment, even 
though it was killing me to wait. What was 
it? Ten years? Twenty? Finally, he let out a 

deep sigh and looked at me. 

“How much?” [ asked. 

“Six” 

“Six years?” 

He nodded. My arms relaxed a little. 

“That's a pretty good catch,’ I said. 

He nodded again, smiling slightly. “Yeah.” 

“So,” I said, “since you're not going to be 
needing that net ...?” I held my hand out 
towards him. 

A hum filled the air as several large flyers 
and a few medium-sized ones emerged from 
the grass. Jeremy shuffled his feet, his fin- 
gers gripping the net. “Actually... I think... 
maybe I'll try for just one more.” m 


Carie Juettner is a writer in Austin, Texas. 
Her fiction has appeared in Hello Horror, 
Dark Moon Digest, Microhorror and 
Writers Weekly, among other places. Follow 
her blog at www.cariejuettner.com. 
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